

Attorney 



et No. 1 19385-00028 / 1607 



THE UNITED STATES PATENT AND TRADEMARK OFFICE 
BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES 



Appellant: 
Appl. No. 
Conf. No. 
Filed: 
Title: 



Art Unit: 
Examiner: 



Madison et al. 

09/776,191 

3237 

February 2, 2001 

NUCLEIC ACID MOLECULES ENCODING TRANSMEMBRANE 
SERINE PROTEASES, THE ENCODED PROTEINS AND METHODS 
BASED THEREON 

1652 

Yong D. Pak 



Mail Stop Appeal Brief - Patents 
Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

APPELLANT'S APPEAL BRIEF 

Sir: 

Appellant submits this Appeal Brief in support of the Notice of Appeal, filed on 
August 14, 2008. This Appeal is from the Final Rejection in the Office Action, dated March 
26, 2008. The Appeal Brief is filed with a five-month Extension of Time under Rule 136(a). 



03/18/3009 WftBDELRl 00000073 0E1818 09776191 
01 FC:2402 270.00 Dft 



CERTIFICATE OF MAILING BY "EXPRESS MAIL" 
"Express Mail" Mailing Label Number EV 740126652 US 
Date of Deposit: March 16, 2009 

I hereby certify that this paper is being deposited with the United 
States Postal "Express Mail Post Office to Addressee" Service under 
37 CFR §1.10 on the date indicated above and is addressed to: Mail 
Stop Appeal Brief-Patents, Commissionerrfbr Patents, U.S, Patent and 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




# 

)ocKet 



Attorney's Dd^T^t No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



L REAL PARTY IN INTEREST 



The real party in interest for the above-identified patent application on Appeal is 

Dendreon Corporation 

by virtue of an Assignment recorded May 20, 2002 at reel 014703, frame 0441 in the United 



States Patent and Trademark Office. 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




,11 



Attorney's DoTR^t No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



II. RELATED APPEALS AND INTERFERENCES 

Appellant's legal representative and the Assignee of the above-identified patent 
application do not know of any prior or pending appeals, interferences or judicial proceedings 
that may be related to, directly affect or be directly affected by or have a bearing on the 
Board's decision with respect to the above-identified Appeal. 



-3- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 
Customer Number: 77202 

III. STATUS OF CLAIMS 
Claims 1, 10-13,20, 34-36, 40-46, 48-55, 108, 109, 113-116, 118-120 and 122-126 
are pending in the above-identified patent application. Claims 10, 43-46, 48-55, 108, 109, 
1 15, 1 16, 1 18-120 and 122-126 are withdrawn fi-om consideration, but are retained for 
possible rejoinder upon allowance of a generic claim. Claims 1, 11-13, 20, 34-36, 40-42, 113 
and 114 are rejected. Therefore, Claims 1, 1 1-13, 20, 34-36, 40-42, 1 13 and 1 14 are the 
subject of this appeal. A copy of the appealed claims, and all pending claims, is included in 
the Claims Appendix. 




Ji 



Attorney's Do^tNo.: 119385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



-4- 



Applicant 
Serial No. 
Filed 




Madison et al. Attorney's DoSRet No,: 1 19385-00028 / 1607 

09/776,191 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

IV, STATUS OF AMENDMENTS 

No amendment was filed subsequent to the final rejection. Appellant filed a Notice of 
Appeal on August 14, 2008 (mailed on that date via Express mail certificate of mailing). 

Appellant attaches a copy of the Final Office Action as Exhibit 1 in the Evidence 
Appendix. 



"5- 



Applicant 
Serial No. 
Filed 



Madison et al 
09/776,191 
February 2, 2001 




Attorney's Do^et No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



V. SUMMARY OF CLAIMED SUBJECT MATTER 

The following is a brief discussion of subject matter of the claimed subject matter. As 
described and defined in the application (see, e.g., page 7, last paragraph- page 8; and page 
18, line 13, - page 19). Transmembrane serine protease (hereinafter MTSPs) are a known 
family of serine proteases. Their identity and sequences are known, and, the prior art teaches 
that these proteases require activation and cleavage for activity. The active form is typically 
a two chain or other multi-chain form. There is no teaching or suggestion in any art, that 
isolated protease domains of the protease as a single chain has activity, nor is there any 
teaching or suggestion for isolating such domain. Independent claim 1 is directed to isolated 
single chain protease domains of an MTSP that are modified by replacing a firee cysteine with 
another amino acid; all claims are dependent thereon. The free cysteine in the protease 
domain, is not free in the activated full-length molecule. Modification of the single chain 
protease domain by replacing the free cysteine prevents aggregation that occurs by virtue of 
interaction among the free cysteines among molecules. Since none of the art suggests that 
the isolated protease domain has activity, none can suggest modifying the isolated protease 
domain to avoid aggregation which will impact on activity. 

As defined in the application (pages 18-20), an MTSP family member is: 

As used herein, "transmembrane serine protease (MTSP)" refers to a family of 
transmembrane serine proteases that share common structural features as described 
herein (see, also Hooper et al. (2001) J. Biol. Chem. 276:857-860). Thus, 
reference, for example, to "MTSP" encompasses all proteins encoded by the MTSP 
gene family, including but are not limited to: MTSPl, MTSP3, MTSP4 and 
MTSP6, or an equivalent molecule obtained from any other source or that has been 
prepared synthetically or that exhibits the same activity. Other MTSPs include, but 
are not limited to, conn, enteropeptidase, human airway trypsin-like protease 
(HAT), MTSPl, TMPRSS2, and TMPRSS4. Sequences of encoding nucleic 
molecules and the encoded amino acid sequences of exemplary MTSPs and/or 
domains thereof are set forth in SEQ ID Nos. 1-12, 49, 50 and 61-72. The term also 
encompass MTSPs with conservative amino acid substitutions that do not 
substantially alter activity of each member, and also encompasses splice variants 
thereof. Suitable conservative substitutions of amino acids are known to those of 
skill in this art and may be made generally without altering the biological activity of 
the resulting molecule. Of particular interest are MTSPs of mammalian, including 
human, origin. Those of skill in this art recognize that, in general, single amino 
acid substitutions in non-essential regions of a polypeptide do not substantially alter 
biological activity (see, e.g., Watson et al. Molecular Biology of the Gene, 4th 
Edition, 1987, The Benjamin/Cummings Pub. Co., p. 224). 

The application identifies the known members of the family: corin, enteropeptidase, human 
airway trypsin-like protease (HAT), hepsin, MTSPl, TMPRSS2, TMPRSS4 and TADG-12), 
and provides sequences of numerous family members and also provides new family members 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




Attorney's D^CRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



{e.g., MTSP3, MTSP4 and MTSP6). Pages 10-12 reference sequence identifiers and or 
references providing the sequences of each member of the family: 

. . . corin (accession nos. AFl 33845 and AB013874; see, Yan et al. (1999) J. 
Biol. Chem. 274:14926-14938; Tomita et al. (1998) J. Biochem. 124:784-789; Uan 
et al. (2000) Proc, Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ED Nos. 61 and 62 
for the human protein); enteropeptidase (also designated enterokinase; accession 
no. U09860 for the human protein; see, Kitamoto et al. (1995) Biochem. 27: 4562- 
4568; Yahagi et al. (1996) Biochem. Biophys. Res. Commun. 219:806-812; 
Kitamoto et al. (1994) Proc. Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et 
al. (1994) J. Biol. Chem. 269:19976-19982; see SEQ ID Nos. 63 and 64 for the 
human protein); human airway trypsin-like protease (HAT; accession no. 
AB002134; see Yamaoka et al. J. Biol. Chem. 273:1 1894-1 1901; SEQ ID Nos. 65 
and 66 for the human protein); hepsin (see, accession nos. Ml 8930, AF030065, 
X70900; Leytus et al. (1988) Biochem. 27: 1 1895-1 1901; Vu et al. (1997) J. Biol. 
Chem. 272:31315-31320; and Farley et al. (1993) Biochem. Biophys. Acta 
1 173:350-352; SEQ ID Nos. 67 and 68 for the human protein); TMPRS2 (see, 
Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 
and 70 for the human protein) TMPRSS4 (see, Accession No. NM 016425; 
Wallrapp et al. (2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human 
protein); and TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see 
International PCX application No, WO 00/52044, which claims priority to U.S. 
application Serial No. 09/261,416). 

. . . Exemplary MTSPs (see, e.g., SEQ ID No. 1-12, 49 and 50) are provided 
herein, as are the single chain protease domains thereof as follows: SEQ ID Nos. 1, 
2, 49 and 50 set forth amino acid and nucleic acid sequences of MTSPl and the 
protease domain thereof; SEQ ID No. 3 sets forth the MTSP3 nucleic acid 
sequence and SEQ ID No. 4 the encoded MTSP3 amino acids; SEQ ID No. 5 
MTSP4 a nucleic acid sequence of the protease domain and SEQ ID No. 6 the 
encoded MTSP4 amino acid protease domain; SEQ ID No. 7 MTSP4-L a nucleic 
acid sequence and SEQ ID No. 8 the encoded MTSP4-L amino acid sequence; SEQ 
ED No. 9 an MTSP4-S encoding nucleic acid sequence and SEQ ID No. 10 the 
encoded MTSP4-S amino acid sequence; and SEQ ID No. 1 1 an MTSP6 encoding 
nucleic acid sequence and SEQ ID No. 12 the encoded MTSP6 amino acid 
sequence. The single chain protease domains of each are delineated below. 

As described in the application, and noted above, Appellant has discovered that the 
protease domain as a single chain polypeptide that contains only the protease domain of an 
MTSP protease possesses protease activity. Prior to this the dogma in the protease field was 
that these serine proteases exist as a zymogen that requires activation cleavage for activity. 
Activation cleavage cleaves the disulfide bond that forms between a cysteine residue in the 
protease domain and another domain of the enzyme. As a result of the activation cleavage, 
the active protease occurs as a two-chain or multi-chain molecule. See, e.g., Lin et aL, (J. 
Biol. Chem. 274:18231-18236 (1999), Exhibit 20, which teaches that serine proteases are 
synthesized as single-chain zymogens, which are proteolytically activated to become active 
two-chain forms {e.g., see page 18235, col. 2, first full paragraph); and Takeuchi et al. (Proc. 



-7- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 
Customer Number: 77202 

Natl. Acad. Sci. USA 96: 1 1054-11061 (1999), Exhibit 3), which describes the pro-domain 
region of its MTSPl as disulfide bonded to the protease domain (see page 1 1058, col, 1 and 
page 1 1060, col. 1, first paragraph) and remains bonded to the protease domain afl:er auto- 
activation (page 1 1058, lines 8-9), resulting in a polypeptide that includes a protease domain 
disulfide bonded to a pro-domain having a two-chain form. 

The application teaches (see, e.g.^ page 8, lines 15-21; page 20, lines 1-6; page 25, 
line 4 through page 26, line 25; page 58, lines 5-1 1) that the single chain protease domain is 
active. The application also teaches how to identify a protease domain (see, e.g., page 8, 
lines 7-14 and page 19, lines 3-24). For example, at page 18, line 24 through page 20, line 6, 
the specification defines a protease domain of an MTSP as well as the requisites for activity 
and how to identify a protease domain as: 

As used herein, a "protease domain of an MTSP" refers to the protease domain 
of MTSP that is located within the extracellular domain of a MTSP and exhibits 
serine proteolytic activity. It includes at least the smallest fragment thereof that acts 
catalytically as a single chain form. Hence it is at least the minimal portion of the 
extracellular domain that exhibits proteolytic activity as assessed by standard assays 
in vitro assays. Those of skill in this art recognize that such protease domain is the 
portion of the protease that is stmcturally equivalent to the trypsin or chymotrypsin 
fold. 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated 
in Figures 1-3. Smaller portions thereof that retain protease activity are 
contemplated. The protease domains vary in size and constitution, including 
insertions and deletions in surface loops. They retain conserved structure, 
including at least one of the active site triad, primary specificity pocket, 
oxyanion hole and/or other features of serine protease domains of proteases. 
Thus, for purposes herein, the protease domain is a portion of a MTSP, as defined 
herein, and is homologous to a domain of other MTSPs, such as corin, 
enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, TMPRSS2, 
and TMPRSS4, which have been previously identified; it was not recognized, 
however, that an isolated single chain form of the protease domain could function 
proteolytically in in vitro assays. As with the larger class of enzymes of the 
chymotrypsin (SI) fold (see, e.g., Internet accessible MEROPS data base), the 
MTSPs protease domains share a high degree of amino acid sequence identity. 
The His, Asp and Ser residues necessary for activity are present in conserved 
motifs. The activation site, which results in the N-terminus of second chain in the 
two chain forms is has a conserved motif and readily can be identified (see, e.g., 
amino acids 801-806, SEQ ID No. 62, amino acids 406-410, SEQ ID No. 64; amino 
acids 186-190, SEQ ID No. 66; amino acids 161-166, SEQ ID No. 68; amino acids 
255-259, SEQ ID No. 70; amino acids 190-194, SEQ ID No. 72). 

As used herein, the catalytically active domain of an MTSP refers to the 
protease domain 

Signiflcantly, it is shown herein, that, at least in vitroy the single chain 
forms off the MTSPs and the catalytic domains or proteolytically active 
portions thereoff (typically C-terminal truncations) thereoff exhibit protease 
activity. Hence provided herein are isolated single chain forms of the protease 




Attorney's DoCKet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



-8- 




Applicant : Madison e/ a/. Attorney's D^^t No.: 119385-00028/ 1607 
Serial No. : 09/776,191 APPELLANT'S APPEAL BRIEF 

Filed : February 2, 2001 
Customer Number: 77202 

domains of MTSPs and their use in in vitro drug screening assays for identification 
of agents that modulate the activity thereof 

The specification teaches modified protease domains (see, e.g., page 11, the description for each 

of the working examples, and the working examples, which describe replacement of the free 

(unpaired) Cys residue in the protease domain): 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are also 
contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). See, also, 
Figure 4, which identifies the free Cys residues in MTSP3, MTSP4 and MTSP6. 

Claims on Appeal and exemplary supporting disclosure in the application 

Claims 1, 11-13, 20, 34-36, 40-42, 113 and 1 14 are the subject of this appeal and each is 
argued separately throughout. Independent Claim 1 is directed to an isolated, substantially 
purified (e.g., see page 46, lines 4-15) single-chain polypeptide, consisting only of a protease 
domain of a type-II membrane-type serine protease (MTSP) (e.g., see page 17, line 24 through 
page 19, line 2 and page 25, line 4-page 26, line 12) or a catalytically active fi-agment thereof 
(e.g., see page 26, lines 13-25) as a single chain (e.g., see page 26, lines 13-25 and 58, lines 5- 
1 1), wherein a free Cys (e.g., see page 10, lines 4-6) in the protease domain is replaced with 
another amino acid (e.g., see page 10, lines 3-13); and the MTSP protease domain or 
catalytically active fi-agment thereof has serine protease activity (e.g., see page 31, lines 14-20) 
as a single chain (e.g., see page 26, lines 13-25 and 58, lines 5-20; original claim 1). All claims 
ultimately depend fi"om claim 1 . 

Dependent claim 1 1 is directed to the substantially purified polypeptide of claim 1, 
wherein the MTSP is selected firom among MTSPl, MTSP3, MTSP4 and MTSP6 (e.g., see 
page 8, line 30 through page 9, line 8 and original claim 11). 

Dependent claim 12 is directed to the substantially purified (e.g., see page 46, lines 4- 
1 5) polypeptide of claim 1 , where the MTSP protease domain consists of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino 
acids 217-443 in SEQ ID No. 12 (e.g., see page 25, lines 22-27 and original claim 12). 

Dependent claim 13 is directed to the substantially purified (e.g., see page 46, lines 4- 
15) polypeptide of claim 1 that has at least about 95% sequence identity with a protease 



-9- 



Applicant 
Serial No. 
Filed 




Madison et aL Attorney's DoCRet No.: 1 19385-00028 / 1607 

09/776,191 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

domain consisting of a sequence of amino acid residues selected from among amino acids 
615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acids set forth 
as SEQ ID No. 6, and amino acids 217-443 in SEQ ID No. 12 (e.g., see page 25, lines 22-31 
and original claim 13). 

Dependent claim 20 is directed to the polypeptide of claim 1, where a free Cys in the 
protease domain is replaced with a serine ((e.g^., see page 10, lines 3-13, page 163, lines 4-8 
and original claim 20). 

Dependent claim 34 is directed to the polypeptide of claim 1, where the MTSP is 
selected from among corin, MTSP 1 , enteropeptidase, human airway trypsin-like protease 
(HAT), TMPRSS2, and TMPRSS4 ((e.g., see page 8, line 30 through page 9, line 8 and 
original claim 34). 

Dependent claim 35 is directed to a conjugate (e.g., see page 38, lines 1-8 and page 
123, line 30 through page 136, line 2), that includes a) a polypeptide of claim 1, and b) a 
targeting agent (e.g., see page 38, lines 9-15 and page 130, lines 9-17) linked to the protein 
directly or via a linker (e.g., see page 126, line 9 through page 130, line 7), where the 
conjugate has serine protease activity (e.g., see page 10, lines 3-13 and original claim 35). 

Dependent claim 36 is directed to a conjugate of claim 35, wherein the targeting agent 
permits i) affinity isolation or purification of the conjugate; ii) attachment of the conjugate 
to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected tissue or cell 
(e.g., see page 14, lines 19-26 and original claim 36). 

Dependent claim 40 is directed to a solid support (e.g., see page 126, lines 12-15) 
comprising two or more polypeptides of claim 1 linked thereto either directly or via a linker 
(e.g., see page 131, line 92 through page 134, line 30 and original claims 39). 

Dependent claim 41 is directed to the solid support of claim 40 and recites that the 
polypeptides comprise an array (e.g., see page 132, lines 4-8 and original claim 40). 

Dependent claim 42 is directed to the solid support of claim 41 and recites that the 
array includes polypeptides having different MTSP protease domains (e.g. , see and original 
claim 41). 

Dependent claim 1 13 is directed to a solid support (e.g., see page 126, lines 12-15) 
comprising two or more polypeptides of claim 12 linked thereto either directly or via a linker 
(e.g., see page 126, line 9 through page 130, line 7 and original claim 112). 



-10- 



Applicant 
Serial No. 
Filed 




Madison et aL Attorney's DdBlfet No.: 1 19385-00028 / 1607 

09/776, 1 9 1 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

Claim 114 depends from claim 113 and specifies that the polypeptides comprise an array (e.^., 
see page 132, lines 4-8 and original claim 113). 

A list of the currently pending claims is provided in the Claims Appendix of this 

Brief 



-11- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DdSIcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

VI. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

A. Rejections under 35 U.S. C. § 112, first paragraph 

1. Claims 1, 11,20,34-36,40-42, 113 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, as containing subject matter that was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant 
art that the inventor(s), at the time the application was filed, had possession of the 
claimed subject matter. 

2. Claims 1,11, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, because the specification, while being enabling for a polypeptide 
consisting of amino acids 615-855 of SEQ ID NO:2, allegedly does not 
reasonably provide enablement for a polypeptide consisting of any protease 
domain of any type II membrane type serine protease (MTSP) or a catalytically 
active portion thereof. 

B. Rejection under 35 U.S.C. 102(b) 

Claims 1,11-13, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. 
§102(b) as being anticipated by Takeuchi et al., Proc. Natl. Acad. Sci. USA 96: 
1 1054-1 1061 (1999) ("Takeuchi"), a copy of which is attached in the Evidence 
Appendix as Exhibit 3. 

C. Rejection under 35 U.S.C. 102(e) 

Claims 1,11-13 and 34 are rejected under 35 U.S.C. § 102(e) as anticipated by 
O'Brien et a/., U.S. Patent No. 5,972,616 C'O'Brien"), a copy of which is attached 
in the Evidence Appendix as Exhibit 4. 

D. Rejection under 35 U.S.C. 103(a) 

Claims 1, 1 1-13 and 34-36, 40-42 and 113-1 14 are rejected under 35 U.S.C. 
103(a) as being unpatentable over O'Brien. 



-12- 



Applicant : Madison et al. 
Serial No, : 09/776,191 
Filed : February 2, 2001 
Customer Number: 77202 




Attorney's DocRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




VIL ARGUMENTS 



1. REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 35 

U.S.C. §112, FIRST PARAGRAPH - POSSESSION 



Claims 1,11, 20, 34-36, 40-42, 1 13 and 1 14 are rejected under 35 U.S.C. §112, 
first paragraph, as allegedly containing subject matter that was not described in the 
specification in such a way as to reasonably convey to one skilled in the art that the 
inventor, at the time the application was filed, had possession of the claimed subject 
matter. The Examiner alleges that claims 1, 1 1, 20, 34-36, 40-42 and 1 13-1 14 are drawn 
to a polypeptide consisting of a protease domain or catalytically active fragment thereof of 
type-II membrane-type serine protease (MTSP) from any source and concludes that these 
claims are drawn to a genus of polypeptides having any structure. The Examiner alleges 
that the specification only teaches four species, and that four species are not a sufficient 
number of representative species of the genus to describe the whole genus. The Examiner 
also alleges that there is no evidence on the record of the relationship between the structure 
of the exemplary catalytically active protease domains and the structure of the serine 
protease domain of any or all MTSP polypeptides or MTSPl polypeptides. The Final 
Office Action concludes that the specification fails to sufficiently describe the claimed 
subject matter in such full, clear, concise, and exact terms that a skilled artisan would 
recognize that Appellant was in possession of the claimed subject matter. The rejection 
respectfully is traversed. 

A. LEGAL STANDARDS - 35 U.S.C. §112, FIRST PARAGRAPH - 
POSSESSION 

The purpose behind the written description requirement is to ensure that the patent 

Appellant had possession of the claimed subject mater at the time of filing of the application. 

The relevant law and a discussion of the Patent Office Guidelines are set forth in the previous 

responses of record in this application and below. Briefly, the Federal Circuit has discussed 

the application of the written description requirement of the first paragraph of 1 12 to claims in 

the field of biotechnology. See University of California v. Eli Lilly and Co., 119 F,3d 1559, 

43 U.S.P.Q.2d 1398, 1406 (Fed. Cir. 1997). The court explained that: 

In claims involving chemical materials, generic formulae usually 
indicate with specificity what the generic claims encompass. One skilled in 
the art can distinguish such a formula from others and can identify many of 
the species that the claims encompass. Accordingly, such a formula is 
normally an adequate description of the claimed genus ... a generic 
statement such as "vertebrate insulin or "mammalian insulin without more, is 



-13- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's bSHJet No,: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

not an adequate written description of the genus because it does not 
distinguish the claimed genus from others, except by function. It does not 
specifically define any of the genes that fall within its definition. It does not 
define any structural features commonly possessed by members of the genus 
that distinguish them from others. One skilled in the art therefore cannot, as 
one can do with a fiiUy described genus, visualize or recognize the identity of 
the members of the genus. A definition by function, as we have previously 
indicated, does not suffice to define the genus because it is only an indication 
of what the gene does, rather than what it is. 

The court also stated that "[ajwritten description of an invention involving a chemical 
genus, like a description of a chemical species, 'requires a precise definition, such as by 
structure, formula, [orjchemical name,' of the claimed subject matter sufficient to distinguish it 
from other materials." Id. at 1567, 43 U.S.P.Q.2d at 1405. Finally, the court addressed the 
manner by which a genus of might be described. "A description of a genus of cDNA may be 
achieved by means of a recitation of a representative number of cDNAs, defined by nucleotide 
sequence, falling within the scope of the genus or of a recitation of structural features common 
to the members of the genus, which features constitute a substantial portion of the genus." Id. 

The Federal Circuit also has addressed the written description requirement in the 

context of biotechnology-related subject matter in Enzo Biochem. Inc. v. Gen-Probe, 296 F.3d 

1316, 63 USPQ2d (BNA) 1609 (Fed. Cir. 2002). The Enzo court adopted the standard that: 

the written description requirement can be met by 'showing that an invention 
is complete by disclosure of sufficiently detailed, relevant identifying 
characteristics . . . complete or partial structure, other physical chemical 
properties, functional characteristics when coupled with a known or 
disclosed correlation between function and structure, or some combination of 
such characteristics.' 

The court in Enzo adopted its standard from the Written Description Examination Guidelines. 

The Guidelines apply to proteins as well as nucleic acid molecules. 

It is well-settled that the written description requirement of 35 U. S. C. §1 12, first 

paragraph, can be satisfied without express or explicit disclosure of a later-claimed invention. 

See, In re Herschler, 591 F.2d 693, 700-01, 200 USPQ 711, 717 (CCPA 1979): 

"The claimed subject matter need not be described in haec verba to satisfy the 
description requirement. It is not necessary that the application describe the 
claim limitations exactly, but only so clearly that one having ordinary skill in the 
pertinent art would recognize from the disclosure that appellants invented 
processes including those limitations." (citations omitted). 

See also Purdue Pharma L. P. v. Faulding, Inc., 230 F.3d 1320, 56 USPQ2d 1481 (Fed. Cir. 
2000). 



-14- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSSUet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

The written description requirement of 35 U.S.C § 1 12, first paragraph, can be 
satisfied by providing sufficient disclosure, either through illustrative examples or 
terminology. This clause does not require "a specific example of everything within the scope 
of a broad claim." In re Anderson, 176 USPQ 331, at 333 (CCPA 1973), emphasis in 
original. Further, because "it is manifestly impracticable for an applicant who discloses a 
generic invention to give an example of every species falling within it, or even to name every 
such species, it is sufficient if the disclosure teaches those skilled in the art what the invention 
is and how to practice it." In re Grimme, Keil and Schmitz, 124 USPQ 449, 502 (CCPA 
1960). 

B. THE REJECTION OF CLAIMS 1-3. 5, 9, 1 L 19, 20. 34-36, 40-42. 113 AND 1 14 



In setting forth the rejection, the Examiner states that the claims are drawn to 
polypeptides having any structure and are thus drawn to a genus encompassing species having 
substantial variation. The Examiner states that only four species are described in the 
specification and that there is no evidence on the record of the relationship between the 
stmcture of the exemplary catalytically active protease domains and the structure of the serine 
protease domain of any or all MTSP polypeptides. Appellant respectfully submits that this is 
not correct. 

1. Standard for satisfying the written description requirement for possession 

In order to satisfy the written description requirement, one need not provide an 
example of every species encompassed by a claim. It is sufficient to provide identifying 
characteristics, including structural and physical characteristics, functional characteristics 
coupled with known or disclosed correlation with structural characteristics to demonstrate that 
the applicant was in possession of the claimed subject matter. MPEP § 2163; see University 
of California v. Eli Lilly, 119 F. 3d 1559, 1568, 43 USPQ2d 1398, 1406 (Fed. Cir. 1997). 
Further, the standard is an objective one, based on what one of skill in the art would recognize 
in the disclosure. In re Gosteli, 872 F.2d at 1012. As is discussed in more detail below, it 
respectfully is submitted that the instant application sufficiently describes the claimed genus 
of isolated MTSP protease domains to demonstrate possession of the claimed subject matter at 
the time of the effective filing date of each claim as required by this standard. 



UNDER 35 U.S.C. $112, FIRST PARAGRAPH SHOULD BE REVERSED 
BECAUSE THE SPECIFICATION MEETS THE WRITTEN DESCRIPTION 
REOUIREMENT WITH RESPECT TO POSSESSION 



Claim 1 



-15- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSBcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

2. Specification describes more than four species of MTSP protease domains 

In this instance, the specification identifies all known members of the family and 
identifies several new members, including protease domains (as well as full-length) MTSP3, 
MTSP6 two splice variants of MTSP4. Thus, contrary to the Examiner's assertion that the 
specification provides only four species of protease domains, Appellant respectfully submits 
that the apphcation identifies all of the 17 laiown members of the MTSP family (see, e,g., 
page 4) known at the time of fihng, and provides the sequences of full-length MTSP proteases 
and identifies the protease domains thereof. In addition, the specification teaches how to 
identify a protease domain in an MTSP, how to identify a fi-ee Cys residue and to replace a 
Cys residue. The members of the MTSP family provided include, MTSPl (also referred to 
as matriptase and TAGD-15), MTSP3, MTSP4 (two variants encoded by splice variants), 
MTSP6, corin, enteropeptidase, human airway trypsin-like protease (HAT), hepsin, TMPRS2 
and TMPRSS4. For example, page 4, line 20 through page 5, line 17 of the specification 



In marrunals, at least 17 members of the family are known, including seven in humans 
(see, Hooper et al, (2001) J. Biol. Chem. 276:857-860). These include: corin (accession 
nos. AF133845 and AB013874; see, Yan et al, (1999) J. Biol. Chem. 274:14926-14938; 
Tomita et al, (1998) J. Biochem. 124:784-789; Uan et al. (2000) Proc. Natl. Acad. Sci. 
U.SA. 97:8525-8529); enteropeptidase (also designated enterokinase; accession no. 
U09860 for the human protein; see, Kitamoto et al (1995) Biochem. 27: 4562-4568; 
Yahagi et al (1996) Biochem. Biophys. Res. Commun. 219:806-812; Kitamoto et al, 
(1994) Proc. Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et aL (1994) J. Biol. 
Chem. 269:19976-19982;); human airway trypsin-like protease (HAT; accession no. 
AB002134; see Yamaoka et aL J. Biol. Chem. 273:11894-1 1901); MTSPl and matriptase 
(also called TADG-15; see SEQ ID Nos. 1 and 2; accession nos. AFl 33086/ AFl 18224, 
AF04280022; Takeuchi et al, (1999) Proc. Natl. Acad. Sci. U.S.A. 96:1 1054-1 161; Lin et 
aL (1999) J. Biol. Chem. 274:18231-18236; Takeuchi et aL (2000) J. Biol. Chem. 
275:26333-26342; and Kim et al. (1999) Immunogenetics 49:420-429); hepsin (see, 
accession nos. Ml 8930, AF030065, X70900; Leytus et aL (1988) Biochem. 27: 1 1895- 
11901; Vu etaL (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) 
Biochem. Biophys. Acta 1 173:350-352; and see, U.S. Pat. No. 5,972,616); TMPRS2 (see, 
Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et aL (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100); and TMPRSS4 (see. 
Accession No. NM 016425; Walhapp et al, (2000) Cancer 60:2602-2606). 

Thus, the specification provides 17 examples of MTSPs and isolated protease domains {e,g.^ 
see also pages 9-10), including MTSPl, MTSP3, MTSP4 (2 splice variants) and MTSP6, 
incorporates publications describing all known family members and the protease domains 
thereof, and describes full-length sequences. 



recites: 



-16- 



Applicant : Madison et aL 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DoBcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

3. MTSPs are a known family of serine proteases with known structural features 

As noted, the MTSPs are a known and well studied family of enzymes, the 
specification teaches how to identify members of the MTSP family and the specification 
provides relevant structural and functional features that uniquely identify and specify the 
claimed genus of polypeptides. The MTSP protease family of enzymes has been extensively 
studied and characterized, evidenced by the art made of record in Information Disclosure 
Statements and provided in previous responses and herein. Hooper et al. teaches that many of 
the serine proteases are mosaic proteins that include multiple, structurally distinct domains 
necessary for regulating enzymatic activity (Eur. J. Biochem. 267: 6931-6937 (2000), Exhibit 
14). Lin et al. ((1999) J. Biol. Chem. 274:18231-36, Exhibit 20) and Yan et aL ((1999) J. 
Biol. Chem. 274:14926-35), Exhibit 44) teach that MTSPs are a family of proteins that can be 
distinguished from many other types of proteins and enzymes because they have highly 
conserved structures. For example, as discussed in the instant specification, it is known in the 
art that a substrate specificity pocket in the protease domain and conserved cysteines that 
participate in disulfide bonding are highly conserved features in serine proteases (see, e.g.. 
Figure 4 and page 18235 of Lin et al (Exhibit 20) and Figure 2 and page 18236 of Yan et al., 
Exhibit 44). 

MTSPs are a class of serine proteases characterized by having an NHi-terminal 
cytoplasmic tail and a COOH-terminal ectodomain, lacking an NH2-terminal cleavable signal 
sequence, and having a signal/anchor domain that anchors the serine protease in the cell 
membrane (e.g., see Parks et al, J. Biol. Chem. 268: 19101-19109 (1993), Exhibit 26 and 
Parks & Lamb, Cell 64: 111-1%! (1991), Exhibit 27). Tsuji et al, teaches that MTSPs, such 
as hepsin, include a hydrophobic sequence flanked by a sequence having a positive net 
charges on the NHi-terminal side while the COOH-terminal flanking side contains no charge, 
which agrees with the consensus topological sequence for the MTSPs (Tsuji et aL, J Biol 
Chem 266(25): 16948-16953 (1991), Exhibit 37). The MTSPs have the triad of residues 
His57, Asp 102 and Serl95 at the active site (chymotrypsin numbering system), which are in 
close proximity and serve as a functional interacting unit responsible for bond formation and 
cleavage during catalysis (Craik et aL, Science 237:909-913 (1987), Exhibit 10). Thus, an 
MTSP polypeptide can be characterized as a serine protease that includes the conserved 
catalytic triad, lacks a cleavable signal sequence, includes a transmembrane anchoring 
domain, and has positively charged residues on the N-terminal side of a long stretch of 
hydrophobic amino acids and has a characteristic disulfide bond pattern (Walter et aL, Annu. 



-17- 



Applicant : Madison et at. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DWet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Rev. Cell Biol. 2: 499-516 (1986), Exhibit 40). The lack of a signal sequence, a 
characteristic disulfide bond pattern, a characteristic hydrophobic region and the presence of 
a signal/anchor domain also are seen in all of the MTSPs, including hepsin (Leytus et al.^ 
Biochemistry 27: 1067-1074 (1988), Exhibit 19), enteropeptidase (Kitamoto et aL, Proc. 
Natl. Acad. Sci. USA 91: 7588-7592 (1994), Exhibit 17), TMPRSS2 (Paoloni-Giacobino et 
al. Genomics 44: 309-320 (1997), Exhibit 31), and human airway trypsin-like protease 
(Yamaoka a/., J. Biol. Chem. 273: 11895-11901 (1998), Exhibit 43). 

The specification also describes structural features and structure-fiinction 
relationships that identify the MTSP family of polypeptides. Such description includes 
information regarding the tertiary structure of the polypeptide. For example, the specification 
teaches the locus of the disulfide bonds, identifies the Cys residues that link the protease 
domain to the rest of the polypeptide, and teaches that the polypeptide includes at least one of 
the active site triad, primary specificity pocket and oxyanion hole. The specification states 
that the MTSP family of proteins shares a high degree of homology. Hence, other MTSPs, 
such as MTSPs firom other species, can be readily identified by its homology with known 
MTSPs. The specification also teaches that the protease domain of a MTSP shares homology 
and structural features with the chymotrypsin/trypsin family protease domains. The previous 
responses of record and the application establish that the application describes the MTSP 
family and describes identification and isolation of protease domains. 

Most significantly, the application identifies the known members of the MTSP family, 
provides sequences thereof and/or references earlier publications describing the family 
members, and provides working examples for MTSPl, MTSPS, MTSP6 and the two MTSP4 
splice variants. 

4. The specification provides relevant identifying characteristics of the protease 



As discussed in responses of record, methods of identifying and isolating serine protease 
domains of MTSPs were known in the art at the time of filing the application and are taught in 
the specification. The specification describes protease domains of MTSPs and provides 
sequences of exemplars thereof. For example, the specification teaches, e,g.^ at page 19, lines 3- 
24, that: 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated in 
Figures 1-3. Smaller portions thereof that retain protease activity are contemplated. 
The protease domains vary in size and constitution, including insertions and deletions 
in surface loops. They retain conserved structure, including at least one of the active 



domain 



-18- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSSIcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

site triad, primary specificity pocket, oxyanion hole and/or other features of serine 
protease domains of proteases. Thus, for purposes herein, the protease domain is a 
portion of a MTSP, as defined herein, and is homologous to a domain of other MTSPs, 
such as conn, enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, 
TMPRSS2, and TMPRSS4, which have been previously identified; it was not 
recognized, however, that an isolated single chain form of the protease domain could 
function proteolytically in in vitro assays. As with the larger class of enzymes of the 
chymotrypsin (SI) fold (see, e.g., Internet accessible MEROPS data base), the MTSPs 
protease domains share a high degree of amino acid sequence identity. The His, Asp 
and Ser residues necessary for activity are present in conserved motifs. The activation 
site, which results in the N-terminus of second chain in the two chain forms is has a 
conserved motif and readily can be identified (see, e.g., amino acids 801-806, SEQ ID 
No. 62, amino acids 406-410, SEQ ID No. 64; amino acids 186-190, SEQ ID No. 66; 
amino acids 161-166, SEQ ED No. 68; amino acids 255-259, SEQ ID No. 70; amino 
acids 190-194, SEQ ID No. 72). 

The specification also describes how to identify a protease domain of the MTSPs (see, e.g.. 



The protease domains as provided herein are single-chain 
polypeptides, with an N-terminus (such as IV, W, IL and II) generated at 
the cleavage site (generally have the consensus sequence R iWGG, 
R ilVGG, R WLGG, R iVGLL, R WLGG or a variation thereof; an N- 
terminus of R iV or R W, where the arrow represents the cleavage point) 
when the zymogen is activated. To identify a protein domain an RI 
should be identified, and then following amino acids compared to the 
above noted motif[s]. [emphasis added] 

The instant specification teaches that the protease domain includes as a common 

structural feature a conserved catalytic triad. The art of record evidences that this is a 

characteristic feature. For example, Lin et aL teaches that membrane-type serine proteases 

include an invariant catalytic triad, a characteristic disulfide pattern and a proteolytic 

activation site in an Arg-Val-Val-Gly-Gly motif similar to the characteristic RIVGG motif in 

other serine proteases. (Lin et aL, J Biol Chem 274(26): 18231-18236 (1999), Exhibit 21). 

Kitamoto et al. teaches that the catalytic domain of MTSPs has a characteristic disulfide bond 

pattern (Kitamoto et aL, Proc Natl Acad Sci USA 91: 7588-7592 (1994), Exhibit 17). The 

specification teaches how to identify members of the MTSP family. For example, page 49, 

lines 3-10 or the specification recites: 

The MTSPs are a family of transmembrane serine proteases that are found in 
manmials and also other species that share a number of common structural 
features including: a proteolytic extracellular C-terminal domain; a 
transmembrane domain, with a hydrophobic domain near the N-terminus; a short 
cytoplasmic domain; and a variable length stem region containing modular 
domains. The proteolytic domains share sequence homology including conserved 
his, asp, and ser residues necessary for catalytic activity that are present in 
conserved motifs. 



page 8): 



-19- 



Applicant : Madison et ai. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSBlcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Accordingly, the specification and the prior art sets forth specific structural and physical 
features that define MTSPs and their protease domains. 

5. The specification provides relevant identifying characteristics of the genus 
In addition to describing known and newly provided protease domains, the specification 
provides relevant identifying characteristics of the "genus "of serine protease domains as 
instantly claimed, including conserved structural and functional characteristics of an MTSP 
protease domain, provides a number of exemplary protease domains, and also directs those 
skilled in the art to exemplary art that describes conmion structural and functional features 
shared by the protease domain of MTSPs. For example, see page 26, lines 13-25, which 



Hence smaller portions of the protease domains, particularly the single chain domains, 
thereof that retain protease activity £ire contemplated. Such smaller versions will 
generally be C-terminal truncated versions of the protease domains. The protease 
domains vary in size and constitution, including insertions and deletions in surface 
loops. Such domains exhibit conserved stmcture, including at least one structural 
feature, such as the active site triad, primary specificity pocket, oxyanion hole and/or 
other features of serine protease domains of proteases. Thus, for purposes herein, the 
protease domain is a single chain portion of an MTSP, as defined herein, but is 
homologous in its structural features and retention of sequence of similarity or 
homology the protease domain of chymotrypsin or trypsin. Most significantly, the 
polypeptide will exhibit proteolytic activity as a single chain. 

The specification teaches that included among the conserved features of MTSP protease 
domain polypeptides is a catalytic triad and an activation cleavage site, which defines the 
terminus of the protease domain polypeptides when they are isolated as single chain 
polypeptides. 

The specification explains that beyond such conserved features, the polypeptides are 
tolerant of modification. The specification explains that such modifications can be effected 
using numerous methods known in the art. For example, at page 77, line 17 through page 78, 

line 11, the specification states: 

A variety of modifications of the MTSP proteins and domains are contemplated 
herein. An MTSP-encoding nucleic acid molecule can be modified by any of numerous 
strategies known in the art (Sambrook et al,, 1990, Molecular Cloning, A Laboratory 
Manual, 2d ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, New York). The 
sequences can be cleaved at appropriate sites with restriction endonuclease(s), followed 
by further enzymatic modification if desired, isolated, and ligated in vitro, hi the 
production of the gene encoding a domain, derivative or analog of MTSP, care should be 
taken to ensure that the modified gene retains the original translational reading frame, 
uninterrupted by translational stop signals, in the gene region where the desired activity is 
encoded. 



recites: 



-20- 



Applicant : Madison et aL 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DJJRTet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Additionally, the MTSP-encoding nucleic acid molecules can be mutated in vitro or 
in vivo, to create and/or destroy translation, initiation, and/or termination sequences, or to 
create variations in coding regions and/or form new restriction endonuclease sites or 
destroy pre-existing ones, to facilitate further in vitro modification. Also, as described 
herein muteins with primary sequence alterations, such as replacements of Cys residues 
and elimination of glycosylation sites are contemplated. Such mutations may be effected 
by any technique for mutagenesis known in the art, including, but not limited to, 
chemical mutagenesis and in vitro site-directed mutagenesis (Hutchinson et al., J. Biol. 
Chem. 253:6551-6558 (1978)), use of TAB® linkers (Pharmacia). In one embodiment, 
for example, an MTSP protein or domain thereof is modified to include a fluorescent 
label. In other specific embodiments, the MTSP protein is modified to have a 
heterofunctional reagent, such heterofiinctional reagents can be used to crosslink the 
members of the complex. 

The specification incorporates by reference and directs those skilled in the art to 
exemplary art that describes common structural and fiinctional features shared by the protease 
domain of MTSPs. For example, Lin et aL (J. Biol. Chem. 274:18231-36 (1999), Exhibit 20) 
and Yan et al. (J. Biol. Chem. 274:14926-35 (1999), Exhibit 44) teach that MTSPs have 
highly conserved structures, including a cleavage site at the N-terminus of the protease 
domain, a substrate specificity pocket in the protease domain and highly conserved cysteines 
that participate in disulfide bonding (see, e.g., Figure 4 and page 18235 of Lin et aL (Exhibit 
20) and Figure 2 and page 18236 of Yan et al. (Exhibit 44)). Other conserved elements 
include a conserved activation motif ((R/K)VIGG), residues Asp627, Gly-655 and Gly-665 in 
the substrate pocket, v^itYv Asp at the bottom of the substrate pocket, and eight conserved 
cysteines that form intramolecular disulfide bonds (Lin et aL J Biol Chem 274(26): 18231- 
18236 (1999), Exhibit 20). In addition, a correlation between retention of the catalytic triad 
and retention of serine protease activity v^as demonstrated and know^n in the art at the time of 
filing. For example, Craik et aL (Science 237: 909-913 (1987), Exhibit 10), Sprang et aL 
(Science 237: 905-909 (1987), Exhibit 35), Carter e^al. (Nature 332: 564-568 (1988), Exhibit 
8) and Bachovchin et aL (Proc. Natl Acad. Sci. 78: 7323-7326 (1981), Exhibit 5) teach that 
serine protease activity is retained in an MTSP by retaining the conserved structure of the 
catalytic triad. 

The specification provides methods for identification, production, isolation, synthesis 
and/or purification of MTSP protease domains (see e.g., working examples 1-4, which 
describes cloning and expression of the protease domains with the Cys replaced; Example 5 
demonstrates assays for identifying inhibitors of the catalytic activity of each). The 
specification states, for example, that MTSP3, MTSP4 and MTSP6 are isolated fi-om any 
animal, particularly a mammal, and includes but are not limited to, humans, rodents, fowl. 



-21- 



Applicant : Madison et aL 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DOTRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

ruminants and other animals (see page 20, lines 21-23; page 21, lines 11-13; and page 21, 
lines 29-3 1 , respectively). Alternative methods for obtaining the MTSP protein than by 
directly isolating the MTSP protein also are provided. These include synthesis using 
genomic DNA, chemically synthesizing the gene sequence from a known sequence and 
making cDNA to the mRNA that encodes the MTSP protein, for example, and inserting the 
isolated nucleic acids into an appropriate cloning vector (for example, see pages 67-79). 
Methods of identifying and isolating serine protease domains from MTSPs, such as MTSPl 
and matriptase (also referred to as TAGD-15), corin, enteropeptidase, human airway trypsin- 
like protease (HAT), hepsin, TMPRS2 and TMPRSS4, were known in the art at the time of 
filing the application and are taught in the specification {e.g,^ see page 4, line 20 through page 
5, line 17). 

In addition, the specification provides exemplary assays in which catalytic activity of 
the polypeptides can be tested {e.g., see Examples 3 and 4). Thus, the specification describes 
the sequences and provides references, which are incorporated by reference, describing all of 
the known members of the MTSP family and the protease domains thereof, teaches how to 
identify an MTSP, teaches how to identify the protease domain of an MTSP if it is not known 
and teaches how to test the polypeptide for proteolytic activity. 

The art of record and discussed previously and herein evidences that, with the 
information provided in the specification, the skilled artisan can recognize the protease 
domain of an MTSP by its requisite protease domain structure and conserved features. If 
necessary, one of skill in the art could test the polypeptides for catalytic activity using the 
assays provided in the specification or known to those of skill in art to order to identify those 
polypeptides that possess the requisite catalytic activity. 

6. Specification describes modification of MTSP protease domains 

As discussed above, a correlation between retention of the catalytic triad and retention 
of serine protease activity was demonstrated and known in the art at the time of filing {e,g., see 
Craik et al. (Science 237: 909-913 (1987), Exhibit 10). The specification teaches additional 
modifications of the MTSP polypeptides such that protease activity is retained. For example, 
the specification explains that for each individual MTSP, the polypeptides can include about 
60% amino acid sequence identity with the exemplified MTSP. Such modified polypeptides 
exhibit serine protease activity as single chain polypeptides. The specification provides 
exemplary modifications including conservative amino acid substitution (for example, see page 



-22- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DOTRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

10, lines 3-13) and modifications of cysteine residues and/or of glycosylation sites (for 

example, see page 78, lines 1-7). The specification also discloses that non-natural amino acids 

can be introduced as a substitution or addition in the MTSP polypeptides (for example, see 

page 79, lines 10-21). The specification also directs those skilled in the art to exemplary art 

that describes common structural features shared by the transmembrane serine proteases (for 

example, seepage 18, lines 1-15). 

The specification exemplifies the replacement of a free Cys in the protease domain 

with another amino acid. For example, the specification states on page 10, lines 3-13 that: 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are 
also contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). 
See, also, FIG. 4, which identifies the free Cys residues in MTSPS, MTSP4 and 
MTSP6. 

The specification specifically describes the replacement of a free Cys in the protease 

domain with another amnio acid. For example, Example 1, on page 161, lines 4-9, 

exemplifies replacing the free Cys in the protease domain with another amino acid: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists 
when the protease domain of the MTSP3 protein is expressed or the zymogen is 
activated, the free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 
if a chymotrypsin numbering scheme is used, was replaced with a serine. 

As discussed below in more detail, working examples for expression of the protease domains 

of MTSPS, MTSPl and both MTSP4 are provided. 

Conclusion 

The claims are directed to isolated single chain protease domains of a known family 
of proteins, the MTSP family. The instant application provides the sequences of 17 of the 
known MTSP family members (directly or by incorporation by reference of references 
providing the sequences). The instant specification provides new members of the MTSP 
family and provides working examples providing the isolated protease domains thereof, 
where the free Cys is replaced with another amino acid. Appellant has discovered that the 
isolated single chain form of the protease domain of these polypeptides is active and, its, use, 
for example, for preparing antibodies specific thereto and in diagnostic assays. Hence, the 



-23- 



Applicant : Madison et al. 
Serial No, : 09/776,191 
Filed : February 2, 2001 




Attorney's EJJJBIcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

recitation in the claims that the polypeptides consist of a protease domain from an MTSP, are 
single-chain polypeptides having serine protease activity and have a free Cys in the protease 
domain replaced with another amino acid indicates with specificity what the generic claims 
encompass. One skilled in the art can distinguish such a polypeptide from others and can 
identify species that the claims encompass. Having taught the skilled artisan that the single 
chain protease domain of an MTSP is active, how to identify an MTSP and its protease 
domain, and how to test for activity, the skilled artisan is in possession of the entire genus of 
single chain protease domains. 

An adequate written description for a claimed genus only has to provide "relevant, 
identifying characteristics" of a representative number of species (MPEP §2163). It 
respectfully submitted that the instant specification meets this test. As noted, the specification 
describes all 17 known species of MTSPs and isolated protease domains (e.^., see pages 9-10), 
as well as previously unknown species (MTSPS, MTSP4 (2 splice variants) and MTSP6), 
incorporates publications describing all known family members and their full length sequences, 
and provides relevant structural and functional features that uniquely identify and specify the 
claimed genus of polypeptides. The specification teaches that those of skill in the art recognize 
common elements among MTSPs and the protease domains of MTSPs, and teaches a number 
of conserved characteristics for the MTSPs and protease domains thereof, and that the 
sequences and locus of the protease domains are known or can be determined as taught in the 
application. The specification teaches that members of the MTSP family are and were known, 
provides additional members, teaches how to identify and isolate protease domains as single 
chains and how to assess activity. One of skill in the art could, if needed, readily test any of 
those polypeptides for catalytic activity. 

Therefore, in light of Appellant's disclosure, one of skill in the art would have 
recognized from reading the application that Appellant provided single-chain polypeptides with 
the recited protease domain structure that possess serine protease activity. The combination of 
the disclosure of the specific chemical structures of all 17 species of MTSPs known at the time 
of filing and the provision and description of new species within the scope of the claims as well 
as teachings in the specification (and knowledge of those of skill in the art) of how to identify 
serine protease domains, such as based on homology as known in the art and described in the 
specification, and how to isolate a protease domain and also assays for testing for activity and 
the evidence that those of skill in the art are very familiar with the MTSP structure and 



-24- 



Applicant : Madison a/. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSBTet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

function renders it clear that one of skill in the art would recognize that Appellant had 
possession of the claimed polypeptides at the time of the priority date of each claim. One of 
skill in the art would have recognized from reading the disclosure that Appellant had 
possession of this genus as well as numerous species thereof. This teaching and knowledge 
coupled with the ability to test for species within the scope of the claims with the assays 
provided for in the specification and known in the art demonstrates that Appellant sufficiently 
described and was in possession of the polypeptides as claimed, at the effective filing date(s) of 
the claims. 

For the reasons above, each of the dependent claims meets the written description 
requirement and, in addition, additional reasons for each dependent claim are described 
below. 

Dependent Claim 11 

Claim 1 1 depends from claim 1 and includes every limitation thereof. Claim 1 1 recites 
that the MTSP is selected from among MTSPl, MTSP3, MTSP4 and MTSP6. The 
specification describes MTSPl, e.g., at pages 54-58. The specification describes MTSP3, e.g., 
at pages 58-60 and Example 1 (pages 160-167). The specificafion describes MTSP4, e,g., at 
pages 60-63 and Example 2 (pages 167-171. The specification describes MTSP6, e.g., at pages 
63-64 and Example 3 (pages 171-176). The working examples provide isolated protease 
domains with the free Cys residue replaced with another amino acid. Working Example 1 
describes preparation and cloning and expression of the protease domain of MTSP3, Example 
2 and 4, describe cloning and expression of the protease domains of MTPSs 3 and 4, and 
Example 3 describes cloning of MTSP6. Example 4 describes expression of the MTSP4 (both 
variants), MTSP3 and MTSP6 protease domains, with the replaced Cys. Example 6 describes 
cloning and isolated of the protease domain of MTSPl . Example 7 describes production of the 
protease domain of MTSPl £ind purification of the protease domain. 

Appellant respectfially submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes each of the isolated 
protease domains of MTSPl, MTSP3, MTSP4 (two splice variants) and MTSP6, where the 
free Cys is replaced with another amino acid, one of skill in the art would recognize that 
Appellant was in possession of the subject matter of claim 1 1 at its effective filing date. 



-25- 



Applicant : Madison et ah 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSSIcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Dependent Claim 20 

Claim 20 depends from claim 1 and includes every limitation thereof. Claim 20 
recites that a free Cys in the protease domain is replaced with a serine. For the reasons 
articulated above with respect to claim 1, Appellant respectfully submits that one of skill in 
the art would recognize that Appellant was in possession of a substantially purified single- 
chain polypeptide consisting only of a protease domain of a type-II membrane-type serine 
protease (MTSP) or a catalytically active fragment thereof as a single chain, where the MTSP 
protease domain or catalytically active fragment thereof has serine protease activity as a 
single chain and a free Cys in the protease domain is replaced with another amino acid. 

The specification exemplifies-the replacement of a free Cys in the protease domain 

with serine. For example, the specification states on page 10, lines 3-13 that: 

Also provided are muteins of the single chain protease domains and MTSPs, 
particularly muteins in which the Cys residue in the protease domain that is free 
(i.e., does not form disulfide linkages with any other Cys residue in the protein) is 
substituted with another amino acid substitution, preferably with a conservative 
amino acid substitution or a substitution that does not eliminate the activity, and 
muteins in which a glycosylation site(s) is eliminated. Muteins in which other 
conservative amino acid substitutions in which catalytic activity is retained are 
also contemplated (see, e.g., Table 1, for exemplary amino acid substitutions). 
See, also, FIG. 4, which identifies the free Cys residues in MTSPS, MTSP4 and 
MTSP6. 

Table 1 of the specification identifies serine as a substitution for Cys (see page 34, line 6). 
The specification specifically describes the replacement of a free Cys of the protease domain 
with a serine in Example 1, which recites, on page 161, lines 4-9: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists 
when the protease domain of the MTSP3 protein is expressed or the zymogen is 
activated, the free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 
if a chymotrypsin numbering scheme is used, was replaced with a serine. 

Appellant respectfully submits that one of skill in the art would recognize that 
Appellant was in possession of a substantially purified single-chain polypeptide consisting 
only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with a serine. 



-26- 



Applicant : Madison a/. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DSHcetNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Dependent Claim 34 

Claim 34 depends from claim 1 and includes every limitation thereof. Claim 34 
recites that the MTSP is selected from among corin, MTSPl, enteropeptidase, human airway 
trypsin-like protease (HAT), TMPRSS2, and TMPRSS4. For the reasons articulated above 
with respect to claim 1 , Appellant respectfiilly submits that one of skill in the art would 
recognize that Appellant was in possession of a substantially purified single-chain 
polypeptide consisting only of a protease domain of a type-II membrane-type serine protease 
(MTSP) or a catalytically active fragment thereof as a single chain, where the MTSP protease 
domain or catalytically active fragment thereof has serine protease activity as a single chain 
and a free Cys in the protease domain is replaced with another amino acid. 

The specification specifically recites that the protease domains can be from any 
MTSP family member, including corin, MTSPl, enteropeptidase, human airway trypsin-like 
protease (HAT), TMPRSS2, and TMPRSS4. For example, see page 8, line 30 through page 

10, line 2, which recites: 

The protease domains provided herein include, but are not limited to, the single chain 
region having an N-terminus at the cleavage site for activation of the zymogen, through 
the C-terminus, or C-terminal truncated portions thereof that exhibit proteolytic activity 
as a single-chain polypeptide in in vitro proteolysis assays, of any MTSP family member, 
preferably from a mammal, including and most preferably human, that, for example, is 
expressed in tumor cells at different levels from non-tumor cells, and that is not 
expressed on an endothelial cell. These include, but are not limited to: MTSPl (or 
matriptase), MTSP3, MTSP4 and MTSP6. Other MTSP protease domains of interest 
herein, particularly for use in in vitro drug screening proteolytic assays, include, but are 
not limited to: corin (accession nos. AF133845 and AB013874; see, Yan et al. (1999) J. 
Biol. Chem. 274: 14926-14938; Tomita et al. (1998) J. Biochem. 124:784-789; Uan et al. 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ID Nos. 61 and 62 for the 
human protein); enteropeptidase (also designated enterokinase; accession no. U09860 for 
the human protein; see, Kitamoto et al. (1995) Biochem. 27: 4562-4568; Yahagi et al. 
(1996) Biochem. Biophys. Res. Commun. 219:806-812; Kitamoto et al. (1994) Proc. 
Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et al. (1994) J. Biol. Chem. 
269:19976-19982; see SEQ ID Nos. 63 and 64 for the human protein); human airway 
trypsin-like protease (HAT; accession no. AB002134; see Yamaoka et al. J. Biol. Chem. 
273:1 1894-11 901; SEQ ID Nos. 65 and 66 for the human protein); hepsin (see, accession 
nos. Ml 8930, AF030065, X70900; Yamaoka etal. (1988) J Biol Chem 27: 11895-11901; 
Vu et al. (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) Biochem. 
Biophys. Acta 1 173:350-352; SEQ ID Nos. 67 and 68 for the human protein); TMPRSS2 
(see. Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 and 70 
for the human protein) TMPRSS4 (see, Accession No. NM 016425; Wallrapp et al. 
(2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human protein); and 
TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see International PCT 
application No. WO 00/52044, which claims priority to U.S. application Ser. No. 
09/261,416). 



-27- 



Applicant : Madison et al^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DBHTetNo.: 119385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Hence, the application specifically describes the protease domain of MTSP family members 
corin, enteropeptidase, HAT, TMPRSS4 and TMPRSS2 and others. Appellant respectfully 
submits that, in view of the arguments set forth above with respect to claim 1 and the 
teaching in the specification, which describes the protease domain of each of corin, 
enteropeptidase, HAT, TMPRSS4 and TMPRSS2, one of skill in the art would recognize that 
Appellant was in possession of the subject matter of claim 34 at its effective filing date. 
Dependent Claim 35 

Claim 35 recites a conjugate that includes a) a polypeptide of claim 1 , and 
b) a targeting agent linked to the protein directly or via a linker, wherein the conjugate has 
serine protease activity. For the reasons articulated above with respect to claim 1 , Appellant 
respectfiiUy submits that one of skill in the art would recognize that Appellant was in 
possession of a substantially purified single-chain polypeptide consisting only of a protease 
domain of a type-II membrane-type serine protease (MTSP) or a catalytically active fi-agment 
thereof as a single chain, where the MTSP protease domain or catalytically active fi-agment 
thereof has serine protease activity as a single chain and a firee Cys in the protease domain is 
replaced with another amino acid. 

The specification specifically discloses conjugates of single-chain protease domains 
conjugated to a targeting agent, e.g., at page 14, lines 19-26. The specification teaches that 
the conjugates can be prepared by chemical conjugation, recombinant DNA technology or 
combinations thereof, and provides detailed descriptions of chemical conjugation, including 
acid cleavable, photo-cleavable and heat sensitive linker technology and other linkers, fiision 
proteins, peptide linkers, conjugation to targeting agents, and adsorption, absorption and/or 
covalent bonding to a solid support (see e.g., pages 123-131). 

Appellant respectfijlly submits that that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes conjugates of single- 
chain protease domains conjugated to a targeting agent, several different types of conjugation 
technologies for making the conjugates and exemplary conjugates, one of skill in the art 
would recognize that Appellant was in possession of the subject matter of claim 35 at its 
effective filing date. 

Dependent Claim 36 

Claim 36 depends fi*om claim 35 and recites a conjugate that includes a targeting 
agent that permits i) affinity isolation or purification of the conjugate; ii) attachment of the 



-28- 



Applicant 
Serial No. 
Filed 




Madison et al, 
09/776,191 
February 2, 2001 



Attorney's DWket No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



conjugate to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected 
tissue or cell. For the reasons articulated above with respect to claims 1 and 35, Appellant 
respectfully submits that one of skill in the art would recognize that Appellant was in 
possession of a conjugate that includes a substantially purified single-chain polypeptides 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid and a targeting agent. 

The specification recites, e.^., at page 14, lines 19-26 and page 123, line 30 through 
page 124, line 7, that the targeting agent of the conjugate permits affinity isolation or 
purification of the conjugate; attachment of the conjugate to a surface; detection of the 
conjugate; or targeted delivery to a selected tissue or cell. The specification teaches 
exemplary targeting agents, including tissue specific or tumor specific monoclonal 
antibodies, a growth factor or fragment thereof, such as FGF, EGF, PDGF, VEGF, cytokines, 
including chemokines, and other such agents, a protein or peptide fragment that contains a 
protein binding sequence, a nucleic acid binding sequence, a lipid binding sequence, a 
polysaccharide binding sequence, or a metal binding sequence, or a linker for attachment to a 
solid support (see, e,g,, page 124, lines 8-17) as well as linkers that allow for attachment of 
the conjugate to a surface (see, e,g,,, pages 131-136). The specification also describes the 
construction of affinity binding pairs for isolation and/or purification of the conjugate (e.g^., 
see page 131, lines 5-37). 

Appellant respectfiilly submits that that, in view of the arguments set forth above with 
respect to claims 1 and 35 and the teaching in the specification, which describes several 
different types of targeting agents and methods of conjugating such targeting agents to isolated 
protease domains, one of skill in the art would recognize that Appellant was in possession of 
the subject matter of claim 36 at its effective filing date. 
Dependent Claim 40 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. For the reasons articulated above with respect to 
claim 1 , Appellant respectfully submits that one of skill in the art would recognize that 
Appellant was in possession of a substantially purified single-chain polypeptide consisting 
only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 



-29- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DWet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

catalytically active fragment thereof as a single chain, where the MTSP protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports {e.g.^ see pages 131-136). The specification teaches exemplary solid 
supports, including supports having any required structure and geometry, such as beads, 
pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, thin films and 
membranes {e.g,, page 132, lines 26-29). The specification teaches that a plurality of MTSP 
protease domains, including two or more protease domains, can be attached to a solid support 
{e.g., page 132, lines 4-8). 

Appellant respectfially submits that that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes several different 
types of solid supports and methods of conjugating isolated protease domains to solid 
supports, one of skill in the art would recognize that Appellant was in possession of the 
subject matter of claim 40 at its effective filing date. 

Dependent Claim 41 

Claim 41 depends from claim 40 and recites that the polypeptides comprise an array. 
The specification teaches that a plurality of MTSP protease domains can be attached to a 
solid support {e.g,, see page 132, lines 4-8). The instant specification defines an array as a 
collection of elements containing three or more members and that, as in the case for an 
addressable array, the members of the array can be immobilized to discrete identifiable loci 
on the surface of a solid phase {e.g., see page 35, lines 14-20). Hence, for these reasons and 
the reasons articulated above with respect to claims 1 and 40, Appellant respectfiilly submits 
that one of skill in the art would recognize that Appellant was in possession of an array of 
substantially purified single-chain polypeptide consisting only of a protease domain of a type- 
II membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a 
single chain, where the MTSP protease domain or catalytically active fragment thereof has 
serine protease activity as a single chain and a free Cys in the protease domain is replaced 
with another amino acid. 

Dependent Claim 42 

Claim 42 depends from claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. Claim 42 as originally filed recited that the array 



-30- 



Applicant : Madison et af^^ 
Serial No. : 09/776.191 
Filed : February 2, 2001 




Attorney's 



DWretNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

comprises polypeptides having different MTSP protease domains. The specification teaches 
that a plurality of MTSP protease domains can be attached to a solid support {e.g,, see page 
132, lines 4-8). Appellant respectfully submits that, for these reasons and the reasons 
articulated above with respect to claims 1, 40 and 41, one of skill in the art would recognize 
that Appellant was in possession of an array of substantially purified single-chain polypeptide 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fi-agment thereof as a single chain, where the MTSP protease domains or 
catalytically active fi'agments thereof are different, have serine protease activity as a single 
chain and a fi*ee Cys in the protease domains is replaced with another amino acid. 
Dependent Claim 113 

Claim 113 recites a solid support comprising two or more polypeptides of claim 1 2 
linked thereto either directly or via a linker. Claim 12 is not rejected under 35 U.S.C. 1 12, 
first paragraph . The Examiner states that Appellant was in possession of the isolated 
protease domains recited in claim 12, which is directed to the substantially purified 
polypeptide of claim 1, where the MTSP protease domain consists of a sequence of amino 
acid residues selected firom among amino acids 615-855 of SEQ ID No. 2, amino acids 205- 
437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino acids 
217-443 in SEQ ID No. 12. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports (e.g-., see pages 131-136). The specification teaches exemplary solid 
supports, including supports having any required structure and geometry, such as beads, 
pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, thin films and 
membranes {e.g., page 132, lines 26-29). The specification teaches that a plurality of MTSP 
protease domains, including two or more protease domains, can be attached to a solid support 
(e.g., page 132, lines 4-8), 

Appellant respectfully submits that that, because the Examiner admits that Appellant 
was in possession of the polypeptide of claim 12 and in view of teaching in the specification, 
which describes several different types of solid supports and methods of conjugating isolated 
protease domains to solid supports, including conjugating a plurality of isolated protease 
domains to a solid support, one of skill in the art would recognize that Appellant was in 
possession of the subject matter of claim 1 13 at its effective filing date. 



-31- 



Applicant : Madison et al. 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's DSRcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Dependent Claim 114 

Claim 1 14 depends from claim 113 and specifies that the polypeptides comprise an 
array. As discussed above, claim 113 recites a solid support that includes two or more 
polypeptides of claim 12. Claim 12 is not rejected under 35 U.S.C. 112, first paragraph . 
Thus, the Examiner agrees that Appellant was in possession of the subject matter of claim 12, 
which is directed to the substantially purified polypeptide of claim 1 , where the MTSP 
protease domain consists of a sequence of amino acid residues selected from among amino 
acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acid 
residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 

The specification teaches that a plurality of MTSP protease domains can be attached 
to a solid support (e.g^., see page 132, lines 4-8). The instant specification defines an array as 
a collection of elements containing three or more members and that, as in the case for an 
addressable array, the members of the array can be immobilized to discrete identifiable loci 
on the surface of a solid phase {e.g., see page 35, lines 14-20. Hence, for the reasons 
discussed above with respect to claim 1 and also because the Examiner has concluded that 



Appellant was in possession of the subject matter of claim 12, and the specification teaches 
and describes the other elements of claim 1 14, Appellant respectftilly submits that one of skill 
in the art would recognize that Appellant was in possession of an array of substantially 
purified single-chain polypeptide consisting only of a protease domain of a type-II 
membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a single 
chain, where the MTSP protease domain or catalytically active fragment thereof has serine 
protease activity as a single chain and a free Cys in the protease domain is replaced with 
another amino acid and where the MTSP protease domain consists of a sequence of amino 
acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 205- 
437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or as amino acids 
217-443 in SEQ ID No. 12. 



Appellant respectftilly submits that the rejection of claims 1,11, 20, 34-36, 40-42, 113 
and 1 14 under 35 U.S.C. §112, first paragraph, as allegedly containing subject matter that 
was not described in the specification in such a way as to reasonably convey to one skilled in 
the art that the inventor, at the time the application was filed, had possession of the claimed 
subject matter, is erroneous in law and fact and, therefore, should be reversed. 



Summary 



-32- 



Applicant : Madison et al^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



OTRcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 35 U.S.C. 
§112, FIRST PARAGRAPH - Scope of Enablement 

Claims 1, 1 1, 20, 34-36, 40-42, 113 and 114 are rejected under 35 U.S.C. § 1 12, first 
paragraph, because the specification allegedly fails to describe the claimed subject matter in 
such a way as to enable one skilled in the art to make and use the claimed subject matter 
commensurate in scope with these claims. The Examiner states that the specification is 
enabling for a polypeptide that includes amino acids 615-855 of SEQ ID NO:2, amino acids 
205-437 of SEQ ID NO:4, amino acids of SEQ ID NO:6 and amino acids 217-443 of SEQ ID 
NO:l 12. The Examiner alleges that the specification does not reasonably provide enablement 
for a polypeptide consisting of any protease domain of any MTSP or catalytically portion 
thereof and concludes that the claims are drawn to polypeptides having undefined structure. 
The Examiner alleges that predictability of which changes in a protein's amino acid structure 
can be tolerated requires a knowledge of and guidance with regard to the sequence as to which 
amino acids, if any, are tolerant to modification and which are conserved, and detailed 
knowledge of how the protein's structure relates to function. It is alleged that it would require 
undue experimentation for one of skill in the art to make such modified polypeptides with an 
expectation of success because the result of such modifications in unpredictable. It is further 
alleged that the claimed polypeptides encompass a large number of polypeptides and that the 
specification does not provide sufficient guidance on the nature of the changes that can be 
tolerated such that the proteins retain activity. In response to Appellant's arguments in the 
previous Response, evidencing the extensive knowledge in the art with respect to serine 
proteases, the Final Office Action argues that these arguments are not persuasive because the 
specification allegedly does not establish which specific amino acids in the protein's sequence 
can be modified such that the modified polypeptide continues to have proteolytic activity. The 
Examiner alleges that while the art may teach the general structure of MTSP and conserved 
amino acid sequences, protease domains. X-ray crystal structure and other attributes, such 
teachings "will not reduce the burden of undue experimentation on those of ordinary skill in 
the art." Therefore, the Final Office Action concludes, it would require undue experimentation 
to produce claimed polypeptides. 

This rejection respectfully is traversed. The pending claims are directed to protease 
domains of MTSPs, a well-characterized family of proteins; there is no doubt that this family 
of proteins is well known and that those of skill in the art can identify members thereof. It is 
the instant application that teaches that the isolated single-chain protease domain possesses 



-33- 



Applicant : Madison e/a/^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DWRtNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

protease activity and that formation of a two-chain structure (by virtue of disulfide bonding 
with a Cys in the protease domain, which is free in the single chain form) is not needed. Thus 
the issue is not identification of an MTSP, but identification of a protease domain in an 
MTSP. The application clearly teaches how to identify a protease domain and how to replace 
the now free Cys that would have participated in forming a two chain structure. There are no 
issues regarding undue experimentation to isolate MTSPs. 

The specification teaches identification, preparation and isolation of protease domains 
and those of skill in the art, in view of the application, readily can identify and isolate a 
protease domain from any MTSP. As discussed above, with respect to the written description 
rejection, the claims are directed to isolated single chain protease domains. The specification 
teaches that those of skill in the art can identify protease domains and also teaches how to 
identify protease domains. One of skill in the art, in light of the specification, could prepare an 
isolated single chain protease domain, as claimed, for any MTSP and replace the now-free Cys 
with another amino acid. Hence there is no reason to limit the claims to particular species of 
the family, when one of skill in the art, in light of the disclosure, can identify all members of 
the genus. 

A. LEGAL STANDARDS - 35 U.S.C. §112, FIRST PARAGRAPH - ENABLEMENT 

The inquiry with respect to scope of enablement under 35 U.S.C. § 1 12, first paragraph, 
is whether it would require undue experimentation to make and use the subject matter as 
claimed. A considerable amount of experimentation is permissible, particularly if it is routine 
experimentation. The amount of experimentation that is permissible depends upon a number of 
factors, which include: the quantity of experimentation necessary, the amount of direction or 
guidance presented, the presence or absence of working examples, the nature of the invention, 
the state of the prior art, the relative skill of those in the art, the predictability of the art, and the 
breadth of the claims (i.e., the 'Wands factors"). In re Wands, 8 USPQ2d 1400 (Fed. Cir. 
1988). 

The starting point in an evaluation of whether the enablement requirement is satisfied is 
an analysis of each claim to determine its scope. The focus of the inquiry is whether everything 
within the scope of the claim is enabled. As concems the breadth of a claim relevant to 
enablement, the only concern should be whether the scope of enablement provided to one skilled 
in the art by the disclosure is commensurate with the scope of protection sought by the claims. 
In re Moore, 439 F.2d 1232, 169 USPQ 236 (CCPA 1971). Once the scope of the claims is 



-34- 



Applicant : Madison et al^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DBBKet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

addressed, a determination must be made as to whether one skilled in the art is enabled to make 

and use the entire scope of the claimed invention without undue experimentation. 

It is incumbent upon the Examiner to first establish a prima facie case of non- 

enablement. In re Marzocchi, 439 F.2d 220, 223, 169 USPQ 367, 369-70 (CCPA 1971). The 

requirements of 35 USC §112, first paragraph, can be fulfilled by the use of illustrative 

examples or by broad terminology. In re Anderson, 176 USPQ 331, 333 (CCPA 1973): 

... we do not regard section 112, first paragraph, as requiring a specific example 
of everything within the scope of a broad claim ... What the Patent Office is 
here apparently attempting is to limit all claims to the specific examples, not 
withstanding the disclosure of a broader invention. This it may not do. 

In re Grimme, 274 F.2d 949, 952 (CCPA 1960) : 

It is manifestly impracticable for an applicant who discloses a generic 
invention to give an example of every species falling within it, or even to 
name every such species. It is sufficient if the disclosure teaches those skilled 
in the art what the invention is and how to practice it. 

This clause does not require "a specific example of everything within the scope of a 
broad claim." In re Anderson, 176 USPQ 331, at 333 (CCPA 1973), emphasis in original. 
Rather, the requirements of § 112, first paragraph "can be fulfilled by the use of illustrative 
examples or by broad terminology." In re Marzocchi et aL, 469 USPQ 367 (CCPA 
1971)(emphasis added). 

The law is clear that patent documents need not include subject matter that is known in 
the field of the invention and is in the prior art, for patents are written for persons experienced 
in the field of the invention. See Vivid Technologies, Inc. v. American Science and 
Engineering, Inc., 200 F.3d 795, 804, 53 USPQ2d 1289, 1295 (Fed. Cir. 1999) ("patents are 
written by and for skilled artisans"). To hold otherwise would require every patent document 
to include a technical treatise for the unskilled reader. Although an accommodation to the 
"common experience" of lay persons may be feasible, it is an unnecessary burden for inventors 
and has long been rejected as a requirement of patent disclosures. See Atmel Corp,, 198 F.3d 
at 1382, 53 USPQ2d at 1230 (Fed. Cir. 1999) ("The specification would be of enormous and 
unnecessary length if one had to literally reinvent and describe the wheel."); W,L. Gore & 
Assoc, Inc. V. Garlock, Ina, 721 F.2d 1540, 1556, 220 USPQ 303, 315 (Fed. Cir. 1983) 
("Patents are written to enable those skilled in the art to practice the invention, not the public") 

The test of enablement is whether one skilled in the art can make and use what is 
claimed based upon the disclosure in the application and information known to those of skill in 
the art without undue experimentation. United States v. Telectronics, Inc., 8 USPQ2d 1217 



-35- 



Applicant 
Serial No. 
Filed 




Madison et ai!^^ Attorney's DBBcet No.: 1 19385-00028 / 1607 

09/776, 1 9 1 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

(Fed. Cir. 1988). A certain amount of experimentation is permissible as long as it is not undue. 

Atlas Powder Co. v. EJ, DuPont de Nemours, 750 F.2d 1569, 224 USPQ 409 (1984). This 

requirement can be satisfied by providing sufficient disclosure, either through illustrative 

examples or terminology, to teach one of skill in the art how to make and how to use the 

claimed subject matter without undue experimentation. In re Anderson, 176 USPQ 33 1 , at 333 

(CCPA 1973). The "invention" referred to in the enablement requirement of section 1 12 is the 

claimed subject matter. Lindemann Maschinen- fabrik v. American Hoist and Derrick Co., 730 

F.2d 1452, 1463, 221 USPQ 481, 489 (Fed. Cir. 1984). 

As a matter of Patent Office practice, then, a specification disclosure which 
contains a teaching of the manner and process of making and using the invention 
in terms which correspond in scope to those used in describing and defining the 
subject matter sought to be patented must be taken as in compliance with the 
enabling requirement of the first paragraph of § 112 unless there is reason to 
doubt the objective truth of the statements contained therein which must be relied 
on for enabling support. Assuming that sufficient reason for such doubt does 
exist, a rejection for failure to teach how to make and/or use will be proper on that 
basis; such a rejection can be overcome by suitable proofs indicating that the 
teaching contained in the specification is truly enabling. . . it is incumbent upon 
the Patent Office, whenever a rejection on this basis is made, to explain why it 
doubts the truth or accuracy of any statement in a supporting disclosure and to 
back up assertions of its own with evidence or reasoning which is inconsistent 
with the contested statement. 

Id, (emphasis in original); See also Fiers v. Revel, 984 F.2d 1 164, 1 171-72, 25 USPQ2d 1601, 
1607 (Fed. Cir. 1993); Gould v. Mossinghoff, 229 USPQ 1,13 (D.D.C. 1985), affd in part, 
vacated in part, and remanded sub nom, Gould v. Quigg, 822 F.2d 1074, 3 USPQ2d 1302 
("there is no requirement in 35 U.S.C. § 1 12 or anywhere else in patent law that a specification 
convince persons skilled in the art that the assertions in the specification are correct"). A 
patent application need not teach, and preferably omits, what is well known in the art. Spectra- 
Physics, Inc. V. Coherent, Inc, 3 USPQ2d 1737 (Fed. Cir. 1987). 

PTO GUIDELINES 
The PTO has promulgated guidelines, which incorporate the above-noted law, for 
examining chemical/biotechnical applications with respect to 35 U.S.C. §1 12, first paragraph, 
enablement. As set forth in the guidelines, the standard for determining whether the 
specification meets the enablement requirement is whether it enables any person skilled in the 
art to make and use the claimed invention without undue experimentation. In re Wands, 858 
F.2d 731, 737, 8 USPQ2d 1400 (Fed. Cir. 1988). In determining whether any 
experimentation is "undue," consideration must be given to the above-noted factors. 



-36- 



Applicant 
Serial No, 
Filed 



m 



Madison et a 
09/776,191 
February 2, 2001 



Attorney's flV^t No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



As indicated in the published guidelines, it is improper to conclude that a disclosure is 
not enabling based on an analysis of only one of the above factors while ignoring one or more 
of the others. The analysis must consider all the evidence related to each of the factors, and 
any conclusion of non-enablement must be based on the evidence as a whole. Id. 8 USPQ2d 
at 1404 & 1407. 

B, THE REJECTION OF CLAIMS 1, 11, 20, 34-36, 40-42, 113 AND 114 UNDER 
35 U.S.C. §112, FIRST PARAGRAPH SHOULD BE REVERSED BECAUSE 
THE SPECIFICATION MEETS THE WRITTEN DESCRIPTION 
REQUIREMENT WITH RESPECT TO ENABLEMENT 

APPLICATION OF THE FACTORS ENUMERATED IN IN RE WANDS 
Claim 1 

It respectfully is submitted that analysis of enablement requires consideration of all of 
the ''''Wands Factors" and that focusing on one or two of the factors is a misapplication of the 
law. Appellant has discussed application of the Wands Factors" in the previous responses. It 
would not require undue experimentation to isolate single-chain protease domains from any 
MTSP polypeptide. Further, it would not require undue experimentation to make modifications 
thereto. The Examiner admits that enzyme isolation techniques and recombinant and 
mutagenesis techniques are known in the art, and that it is routine in the art to screen for 
substitutions or modifications, including multiple substitutions and multiple modifications as 
encompassed by the instant claims (see Final Office Action, Exhibit 2, page 11). As discussed 
in detail below, and previously, a consideration of the factors enumerated in In re Wands 
demonstrates that the application teaches how to make and use the subject matter as claimed 
without undue experimentation. 

i. Breadth of the Claims 

Claim 1 is directed to an isolated substantially purified single-chain polypeptide 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, wherein the protease domain or 
catalytically active fragment thereof has serine protease activity as a single chain and a free 
Cys in the protease domain is replaced with another amino acid. Claims 11, 20, 34-36, 40- 
42, 113 and 114 ultimately depend from claim 1 and recite additional features and specific 
family members. Claim 1 1 is directed to the substantially purified polypeptide of claim 1, 
and specifies that the MTSP is selected from among MTSPl, MTSP3, MTSP4 and MTSP6. 

Claim 20 recites that a free Cys in the protease domain is replaced with a serine. 
Claim 34 recites particular polypeptides within the scope of claim 1 . Claims 35 and 36 are 



-37- 



Applicant 
Serial No. 
Filed 




Madison et al, 
09/776,191 
February 2, 2001 



Attorney's DWRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



directed to conjugates including a polypeptide of claim 1 and a targeting agent linked to the 
protein directly or via a linker. Claims 40-42 are directed to a solid support including two or 
more polypeptides of claim 1 linked thereto either directly or via a linker. Claims 113 and 
114 are directed to a solid support including two or more polypeptides of claim 12 linked 
thereto either directly or via a linker. 

Hence the claims include as an element an isolated protease domain of a member of 
the MTSP family in which a fee Cys is replaced with another amino acid. The specification, 
as noted, describes all MTSP family members known at the time of filing and provides four 
new members of the family and methods for identifying other members of the MTSP family. 
Thus, the claims are of the same scope as the disclosure in the application. 

ii. Level of Skill 

The level of skill in this art is recognized to be high (see, e.g,, Ex parte Forman, 230 
USPQ 546 (Bd. Pat. App. & IntT 1986)). The numerous articles and patents made of record 
in this application address a highly skilled audience and further evidence the high level of 
skill in this art. 

iii. Teachings of the Specification 

As discussed above and previously, the specification teaches that MTSP polypeptides 
constitute a recognized well known and well characterized family of serine proteases. For 
example, page 18, lines 1-23 of the specification recites: 

As used herein, "transmembrane serine protease (MTSP)" refers to a family of 
transmembrane serine proteases that share common stmctural features as described herein 
(see, also Hooper et al. (2001) J. Biol. Chem. 276:857-860). Thus, reference, for example, 
to "MTSP" encompasses all proteins encoded by the MTSP gene family, including but are 
not limited to: MTSPl, MTSP3, MTSP4 and MTSP6, or an equivalent molecule obtained 
from any other source or that has been prepared synthetically or that exhibits the same 
activity. Other MTSPs include, but are not limited to, corin, enteropeptidase, human 
airway trypsin-like protease (HAT), MTSPl, TMPRSS2, and TMPRSS4. Sequences of 
encoding nucleic molecules and the encoded amino acid sequences of exemplary MTSPs 
and/or domains thereof are set forth in SEQ ID Nos. 1-12, 49, 50 and 61-72. The term 
also encompasses MTSPs with conservative amino acid substitutions that do not 
substantially alter activity of each member, and also encompasses splice variants thereof. 
Suitable conservative substitutions of amino acids are known to those of skill in this art 
and may be made generally without altering the biological activity of the resulting 
molecule. Of particular interest are MTSPs of mammalian, including human, origin. 
Those of skill in this art recognize that, in general, single amino acid substitutions in non- 
essential regions of a polypeptide do not substantially alter biological activity (see, e.g., 
Watson et al. Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings 
Pub. Co., p.224). 



-38- 



Applicant : Madison et al^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



OBRTet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

The specification teaches that a protease domain from an MTSP polypeptide is active as a 
single-chain polypeptide. Additionally, smaller fragments of the protease domain also are 
active as single-chain polypeptides (page 18, line 24-page 19, line 2): 

As used herein, a "protease domain of an MTSP" refers to the protease domain of 
MTSP that is located within the extracellular domain of a MTSP and exhibits serine 
proteolytic activity. It includes at least the smallest fragment thereof that acts catalytically 
as a single chain form. Hence it is at least the minimal portion of the extracellular domain 
that exhibits proteolytic activity as assessed by standard assays in vitro assays. Those of 
skill in this art recognize that such protease domain is the portion of the protease that is 
structurally equivalent to the trypsin or chymotrypsin fold. 

The specification further teaches that MTSP protease domains can vary in sequence but that 
these proteins retain a conserved structure as well as sequence identity to identified MTSP 
proteins exemplified in the application. For example, see page 19, lines 3-24, which recites: 

Exemplary MTSP proteins, with the protease domains indicated, are illustrated in 
Figures 1-3, Smaller portions thereof that retain protease activity are contemplated. The 
protease domains vary in size and constitution, including insertions and deletions in 
surface loops. They retain conserved structure, including at least one of the active site 
triad, primary specificity pocket, oxyanion hole and/or other features of serine protease 
domains of proteases. Thus, for purposes herein, the protease domain is a portion of a 
MTSP, as defined herein, and is homologous to a domain of other MTSPs, such as corin, 
enteropeptidase, human airway trypsin-like protease (HAT), MTSPl, TMPRSS2, and 
TMPRSS4, which have been previously identified; it was not recognized, however, that an 
isolated single chain form of the protease domain could function proteolytically in in vitro 
assays. As with the larger class of enzymes of the chymotrypsin (SI) fold (see, e.g., 
hitemet accessible MEROPS data base), the MTSPs protease domains share a high degree 
of amino acid sequence identity. The His, Asp and Ser residues necessary for activity are 
present in conserved motifs. The activation site, which results in the N-terminus of the 
second chain in the two chain forms is has a conserved motif and readily can be identified 
(see, e.g., amino acids 801-806, SEQ ID No. 62, amino acids 406-410, SEQ ID No. 64; 
amino acids 186-190, SEQ ID No. 66; amino acids 161-166, SEQ ID No. 68; amino acids 
255-259, SEQ ID No. 70; amino acids 190-194, SEQ ID No. 72). 

The application describes the fiill length sequence and protease domain of all species of MTSP 
family members known at the time of filing, including MTSPl, HAT, corin, enteropeptidase, 
TMPRSS4 and TMPRSS2. The specification also identifies four nev^ family members. 

As discussed above, identification of the protease domain from an MTSP region merely 
requires identification of the activation cleavage site, as is outlined in the specification, 
discussed above and known in the art. The locus of the protease domain in the known MTSP 
family members is known, and the instant application provides protease domains from the 
known family members, either directly or by incorporation of reference. 

Furthermore, notwithstanding that the specification provides and describes the protease 
domain of all members of the family known at the time of filing, plus the four additional family 
members, a comparison of sequence identity among family members (see, e.g., Figure 4 of the 



-39- 



Applicant 
Serial No. 
Filed 



Madison et al^^ Attorney's J^lt No.: 1 19385-00028 / 1607 

09/776, 1 9 1 APPELLANT'S APPEAL BRIEF 

February 2, 2001 




Customer Number: 77202 

application) reveals that the protease domains share conserved sequences, including the 
catalytic triad of His, Asp and Ser residues and their surrounding conserved motifs. 
Additionally, the specification demonstrates that MTSP protease domains can have a 
reasonable amount of sequence variation and yet retain serine protease activity. MTSPl, 
MTSP3, MTSP4 and MTSP6 protease domains share about 40% sequence identity with each 
other. The specification teaches that each of these protease domains is an example of an MTSP 
protease domain that has activity in the single chain form. 

The specification also teaches additional modifications. For example, see page 26, 

lines 13-25, which recites: 

Hence smaller portions of the protease domains, particularly the single chain domains, 
thereof that retain protease activity are contemplated. Such smaller versions will generally 
be C-terminal truncated versions of the protease domains. The protease domains vary in 
size and constitution, including insertions and deletions in surface loops. Such domains 
exhibit conserved structure, including at least one structural feature, such as the active site 
triad, primary specificity pocket, oxyanion hole and/or other features of serine protease 
domains of proteases. Thus, for purposes herein, the protease domain is a single chain 
portion of an MTSP, as defined herein, but is homologous in its structural features and 
retention of sequence of similarity or homology the protease domain of chymotrypsin or 
trypsin. Most significantly, the polypeptide will exhibit proteolytic activity as a single 
chain. 

The specification teaches that included in the conserved features of MTSP protease domain 
polypeptides is a catalytic triad as well as the activation cleavage site, which defines the 
terminus of the protease domain polypeptides when they are isolated as single chain 
polypeptides. 

The specification explains that beyond such conserved features the polypeptides are 

tolerant of modification. The specification explains that such modifications can be effected 

using numerous methods known in the art. For example, at page 77, line 17 through page 78, 

line 11, the specification states: 

A variety of modifications of the MTSP proteins and domains are 
contemplated herein. An MTSP-encoding nucleic acid molecule can be modified 
by any of numerous strategies known in the art (Sambrook et al., 1990, Molecular 
Cloning, A Laboratory Manual, 2d ed.. Cold Spring Harbor Laboratory, Cold 
Spring Harbor, New York). The sequences can be cleaved at appropriate sites 
with restriction endonuclease(s), followed by fiirther enzymatic modification if 
desired, isolated, and ligated in vitro. In the production of the gene encoding a 
domain, derivative or analog of MTSP, care should be taken to ensure that the 
modified gene retains the original translational reading frame, uninterrupted by 
translational stop signals, in the gene region where the desired activity is encoded. 

Additionally, the MTSP-encoding nucleic acid molecules can be mutated in 
vitro or in vivo, to create and/or destroy translation, initiation, and/or termination 



-40- 



Applicant : Madison et aL 
Serial No, : 09/776,191 




Attorney's 



DBRTet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Filed : February 2, 2001 
Customer Number: 77202 

sequences, or to create variations in coding regions and/or form new restriction 
endonuclease sites or destroy pre-existing ones, to facilitate further in vitro 
modification. Also, as described herein muteins with primary sequence 
alterations, such as replacements of Cys residues and elimination of glycosylation 
sites are contemplated. Such mutations may be effected by any technique for 
mutagenesis known in the art, including, but not limited to, chemical mutagenesis 
and in vitro site-directed mutagenesis (Hutchinson et aL^ J. Biol. Chem. 253:6551- 
6558 (1978)), use of TAB® linkers (Pharmacia). In one embodiment, for 
example, an MTSP protein or domain thereof is modified to include a fluorescent 
label. In other specific embodiments, the MTSP protein is modified to have a 
heterofiinctional reagent, such heterofunctional reagents can be used to crosslink 
the members of the complex. 

The specification exemplifies variation in MTSP sequences. For example the 
specification provides exemplary MTSPl, MTSP3, MTSP4 and MTSP6 sequences, including 
the sequences of the isolated protease domains. The specification also provides sequences of 
other family members, and, as discussed above, how to identify the protease domain based on 
the consensus sequence thereof, which is conserved among serine proteases. The 
specification explains that MTSPl and MTSP3 amino acid sequences have about 43% identity 
with each other (for example, see page 162, lines 1-2). The specification also discloses that 
MTSPl and MTSP4 have about 37% amino acid sequence identity (for example, see page 
167, lines 25-29). The specification also teaches that MTSP4 and MTSP6 share about 60% 
amino acid sequence identity (for example, see page 172, lines 4-9). The specification teaches 
that each of the protease domains of these MTSP family members is active as single chain that 
contains only the protease domain or a smaller catalytically active portion of the protease 
domain (see, for example at page 20, lines 1-6). Hence, the specification teaches that MTSP 
protease domains that retain the conserved catalytic triad are tolerant of sequence modification 
yet retain activity, and demonstrates that exemplary polypeptides that retain the catalytic triad 
and that have about 40%-60% and greater sequence identity are active as single chain 
polypeptides. 

Notwithstanding differences among the sequences of the family members, the 
specification teaches and provides sequences of most of the family members, refers to 
publications that describe other family members, teaches how to identify a protease domain. 
As discussed above, the instant claims are not directed to discovery of MTSPs as a family, but 
the discovery that the isolated protease domain has activity as a single-chain isolated 
polypeptide. Once one of skill in the art has an MTSP of any type or sequence, one of skill in 



-41- 



Applicant 
Serial No. 
Filed 




Madison et al^^^ Attorney's DWket No.: 1 19385-00028 / 1607 

09/776,191 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

the art, based on the teachings in this specification, isolate the single chain protease domain 
thereof. The specification clearly provides guidance for doing so. 

The specification teaches a modifications of the MTSP polypeptides. For example, the 
specification provides exemplary modifications including conservative amino acid 
substitution (for example, see page 10, lines 3-13) and modifications of cysteine residues 
and/or of glycosylation sites (for example, see page 78, lines 1-7). The specification also 
discloses that non-natural amino acids can be introduced as a substitution or addition in the 
MTSP polypeptides (for example, see page 79, lines 10-21). 

More significantly, the pending claims are directed, not to fiiU-length MTSPs, but to 
isolated single-chain protease domains, where the free Cys is replaced with another amino acid 
that have serine protease activity. One of skill in the art, with an MTSP polypeptide in hand, 
could readily identify and isolate the protease domain of any MTSP as claimed and replace a 
free Cys with another amino acid residue. 

iv. Knowledge of those of skUl in the art 

As discussed above, at the time of filing of the application and before, those of skill in the 
art were very familiar with serine proteases generally, and with the MTSP family in particular. 
The MTSP family was known as was the locus of the protease domain in members of the MTSP 
family. What was absent was any understanding or recognition that an isolated single chain 
protease domain would have activity; hence, such was never isolated. In view of the instant 
application teaching that such protease domains have activity as single chain polypeptides, the 
skilled artisan can readily isolate any protease domain of an MTSP as a single chain and if 
necessary test the isolated protease domain for the requisite activity. Nothing more need be 
known regarding the requisites for activity. 

Notwithstanding this, there was a large body of literature directed to serine proteases and 
there was general understanding of their structures and requisites for activity (see for example. 
Hooper et a/., J. Biol. Chem. 276: 857-860 (2001), Exhibit 15; Nienaber et aL, J. Biol. Chem. 
275: 7239-7248 (2000), Exhibit 24; Sommerhoff a/., Proc. Natl. Acad. Sci. USA 96: 10984- 
10991 (1999), Exhibit 34; Lu et aL, J. Mol. Biol. 292: 361-373 (1999), Exhibit 21; Xu et al, J. 
Biol. Chem. 275: 378-385 (2000), Exhibit 41; Lin et aL, J. Biol. Chem. 274: 18231-18236 
(1999), Exhibit 20; and Bryan, Biochem. Biophys. Acta 1543: 200-203 (2000), Exhibit 7). 
These references detail the existing crystal structures, structural comparisons and structural 
similarities of MTSPs. 



-42- 



Applicant : Madison a/^| 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



I^Fet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

This extensive knowledge also is evidenced, for example, in the appUcation as filed and 
in the literature made of record in the submitted Information Disclosure Statements. As noted 
in the application, the MTSP protease family was known (for example, see pages 4-5). Serine 
proteases are a family that can be distinguished fi'om many other types of proteins and enzymes 
because they have highly conserved structures (see e.g., Lin et aL^ J. Biol. Chem. 274: 1823 1- 
18236 (1999), Exhibit 20 and Yan et aL, J. Biol. Chem. 274: 14926-14935 (1999), Exhibit 44). 
Moreover, it was known at the time of filing that there is a known correlation between 
retention of the catalytic triad and retention of serine protease activity. Hence, available to one 
of skill in the art was the knowledge that serine protease activity could be retained in a serine 
protease by retaining the conserved structure of the catalytic triad (see for example, Carter et 
aL, Nature 332: 564-368 (1988), Exhibit 8, Sprang et al.. Science 237: 905-909 (1987), Exhibit 
35, Craik et al. Science 237: 909-913 (1987), Exhibit 10 and Bachovchin et aL, Proc. Natl 
Acad. Sci. 78: 7323-7326 (1981), Exhibit 5). In addition, other features were identified at the 
time of filing and before as highly conserved features in serine proteases including a cleavage 
site at the N-terminus of the protease domain, a substrate specificity pocket in the protease 
domain and conserved cysteines that participate in disulfide bonding (see for example, Figure 4 
and page 18235 of Lin et al (Exhibit 20) and Figure 2 and page 18236 of Yan et al. Exhibit 
44). Thus, the requisites for retention of serine protease activity are well known and 
characterized and were available at the effective filing date of the claimed subject matter. 
Hence, a wide variety of structural information on serine proteases was well-known in the art. 

Furthermore, the instant claims only require identification of the protease domain of an 
MTSP, and its isolation as a single chain polypeptide. The specification includes and describes 
the protease domains of all MTSP family members known at the time of filing the application. 
Based on the teachings of the specification and known in the art, those of skill in the art can 
readily identify the protease domain region in an MTSP using, e.g., the catalytic triad, the 
cleavage site at the N-terminus of the protease domain and conserved cysteines that participate 
in disulfide bonding as markers, and, if necessary test it for protease activity. Dawson et al. 
(U.S. Pat. No. 5,645,833 (1997), Exhibit 1 1) teaches that the serine protease domain can be 
recognized by its homology with other serine proteases (col. 6, lines 29-32). 

The methods and guidance for comparing amino acid sequences to generate and 
confirm sequences with sequence identity to an MTSP polypeptide sequence such as SEQ ID 
NOS: 2, 4, 6 and 12 was available and routine in the art at the time of filing the instant 



-43- 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 





Attorney's DWKtX No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



application. As described in the instant specification, computer algorithms such as the 
"FAST A" program, using for example, the default parameters as in Pearson et al., Proc. Natl. 
Acad. Sci. USA 85: 2444 (1988), Exhibit 28, were available. Other programs were available 
(see Devereux, J., et al,. Nucleic Acids Research 12(I):387 (1984), Exhibit 12). In addition, 
methods for generating nucleotide and protein sequence variation were widely available in 
the art. Thus, one of skill in the art could use such programs with a serine protease sequence, 
for example, to align the sequence and identify the structural features of importance for 
retention of activity and use the methods for generating sequence variation to make protein 
variants. 

Methods for assaying protease activity including protease specificity, level of activity 
and response to inhibitors was well known in the art (see, for example, Lu et a/., J. Mol. Biol. 
292: 361-373 (1999) (Exhibit 21) and Xu et aL, J. Biol. Chem. 275: 378-385 (2000) (Exhibit 
41)). Methods for high throughput assays and detection also were widely available (e.g., see 
generally, Silverman etal., Curr. Opin. Chem. Biol., 2:397-403 (1998) (Exhibit 32) and 
Sittampalam et aL, Curr. Opin. Chem. BioL, 1 :384-91 (1997) (Exhibit 33). Hence, the 
amount of knowledge of those of skill in the art was extensive and the requisite structural and 
functional features required for protease activity was well known. 

The Examiner states that the specific amino acid positions within a protein's sequence 
where amino acid modification can be made with a reasonable expectation of success in 
obtaining the desired activity are limited in any protein and the result of such modifications is 
unpredictable. Appellant respectfully disagrees in the case of the family of MTSPs. The 
application and the art made of record establish that MTSPs are well known in the art and the 
structural requirements for activity are known and that the instantly claimed polypeptides 
share sequence homology with the chymotrypsin/trypsin family for which tertiary structures 
are known. For example, it was known in the art that serine protease activity could be 
retained in an MTSP by retaining the conserved structure of the catalytic triad (see e.g., Craik 
et aL, Science 237: 909-13 (1987), Exhibit 1 and Carter et aL, Nature 332: 564-568 (1988), 
Exhibit 8). Other highly conserved features in serine proteases also were known to the skilled 
artisan. These include a cleavage site at the N-terminus of the protease domain, a substrate 
specificity pocket in the protease domain and conserved cysteines that participate in disulfide 
bonding (see, e.g.. Figure 4 and page 18235 of Lin et aL (Exhibit 20) and Figure 2 and page 
18236 of Yan et al. (Exhibit 44). The specification also provides exemplary assays for testing 



-44- 



Applicant : Madison e/ a/l^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



rJIHcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

catalytic activity of the polypeptides using routine experimental analysis techniques and also 
provides descriptions of how to assess percentage identity and teaches that these techniques 
were well known in the art. The specification also teaches conserved characteristics among 
MTSPs. Furthermore, the MTSPs are a known family of serine proteases, and the protease 
domain of any member can be readily identified using methods and techniques known in the 
art and/or described in the specification. The serine proteases were among the first enzymes 
to be studied extensively (Perona & Craik, Protein Science 4: 337-360 (1995), Exhibit 30). 

Furthermore, the instant claims are directed to the single-chain protease domain or 
active portion thereof, where protease domain is modified to replace a fi"ee Cys with another 
amino acid (for example to prevent aggregation by virtue of interaction among the free Cys 
residues). The claims on appeal are not new MTSPs per se, but to the protease domains of 



The Examiner states that recombinant and mutagenesis techniques and enzyme 
isolation techniques are known and that it is routine to screen for multiple substitutions or 
multiple modifications as encompassed by the instant claims (see Final Office Action, Exhibit 
1, page 11). Thus, routine techniques can be used to identify or synthesize modified MTSP 
serine protease domains. If needed, one of skill in the art can test polypeptides for catalytic 
activity by routine experimentation using the assays provided in the specification or known to 
those of skill in art. 



The application provides working examples that demonstrate each of the features of the 
claimed polypeptides. For instance, the Examples provide detailed guidance for identifying 
and isolating MTSP protease domains. Example 1 describes the cloning of the full-length and 
the protease domain of MTSP3 and replacement of the fi-ee Cys in the isolated protease domain 
with another amino acid. Example 1 also describes expression of the MTSP3 protease domain 
with replaced Cys. Example 1 also describes the use nucleic acid encoding the probe to assess 
tissue-specific and tumor-specific expression of the MTSP3. 

Example 2 describes the identification and cloning of two MTSP4 polypeptides, 
MTSP4-S and MTSP4-L. Example 2 describes cloning of the full-length polypeptides and also 
the protease domains thereof, and also describes uses of the clones to obtain gene expression 
profiles. Example 3 describes the identification and cloning of an MTSP6 polypeptide and 
protease domain thereof, and also gene expression profiles. Example 4 describes expression of 



MTSPs. 



V. Working Examples 



-45- 



Applicant : Madison et 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



^pPet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

the MTSP4 (both variants), MTSP3 and MTSP6 protease domains, with the replaced Cys. 
Example 6 describes cloning and isolated of the protease domain of MTSPl . Example 7 
describes production of the protease domain of MTSPl and purification of the protease 
domain, hi each case, an MTSP polypeptide sequence is identified that includes a protease 
domain with a cleavage site and a catalytic triad (see, e.g., Figure 4). As noted, for example, in 
Example 1, identification of MTSP3 as a serine protease required only 43% sequence identity. 
Similarly, Example 2 demonstrates that 37% sequence identity with MTSPl was sufficient to 
identify MTSP4. 

The Examples demonstrate additional features of the claimed polypeptides. For 
example, the examples demonstrate production and expression of MTSP protease domains, 
where they free Cys is replaced with another amino acid. The working examples further 
demonstrate that the MTSP polypeptides, sharing, for example, 37-43% sequence identity, are 
active as a single chain protease domains. 

The Examples demonstrate expression of single chain protease domains. Examples 4 
and 5 describe additional expression of MTSP3, MTSP4 and MTSP6 using Pichia pastoris. 
Examples 6 and 7 provide a detailed description of the cloning, expression and purification of 
an MTSPl single chain protease domain. Example 8 provides detailed serine protease assays 
for MTSPl . Additionally, the examples demonstrate replacement of the free Cys. For 
example. Example 1 demonstrates that replacing the cysteine to serine does not substantially 
alter serine protease activity. The examples demonstrate identification of a variety of MTPSs, 
sharing 37-43% sequence identity, and the expression of the protease domains thereof, where 
the Cys is replaced with another amino acid. 



The predictability at issue is whether one of skill in the art could isolate protease 
domains from MTSP family members and variants thereof The issue is not whether the 
claims encompass variant MTSPs, but whether one of skill in the art in possession of an 
MTSP could prepare an isolated protease domain in which a free Cys is replaced with another 
amino acid. Predictability goes to reproducibility. Issues regarding modification of MTSPs 
and requisites therefore are irrelevant. Appellant respectfully submits that one of skill in the 
art, given the instant disclosure, could predictably make such polypeptides, because the 
MTSP family is well known and characterized and the sequences of exemplary new family 
members, as well as all known members, are provided in the application. One of skill in the 



vi. Predictability 



-46- 



Applicant : Madison et 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



B^fetNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

art readily make minor amino acid variation using routine techniques, and, if needed, test 
such polypeptide variants for serine protease activity. The working example demonstrate 
repeating this with 5 different polypeptides (MTSPl, MTSP3, MTSP4-S, MTSP4-L and 
MTSP6). There is no doubt that isolation of a protease domain from an MTSP is reproducible 
and, thus, predictable. There is no doubt that one of skill in the art could prepare an isolated 
protease domain as claimed using techniques routinely practiced in this art. 

In contrast to the allegations of "unpredictability" set forth in the Final Office Action, 
the specification and the knowledge in the art evidence many factors of predictability with 
respect to MTSP polypeptide variants. First, the specification identifies all known MTSP 
family members, including the sequences thereof (in the sequence listing and/or by 
incorporation by reference of others) and also provides new family members. These are 
defined chemical structures from which one of skill in the art is given a reference point. As 
explained above, included among exemplary polypeptides are MTSPl, MTSP3, MTSP4-S, 
MTSP4-L, MTSP6, HAT, corin, enteropeptidase, TMPRSS4 and TMPRSS2. The 
specification demonstrates that these MTSP polypeptides, as well as all family members, share 
conserved features including a protease domain with a catalytic triad and N-terminal activation 
cleavage site. Furthermore, the specification teaches isolation of the protease domains as 
single chains and demonstrates that they possess proteolytic activity. As discussed above, the 
specification provides detailed guidance for identifying a protease domain of any MTSP family 
member. 

Second, the specification delineates structural and fianctional features of the protein. 
These features identify key regions and residues that one of skill in the art would know to 
conserve in order to retain serine protease activity. These features also provide reference 
points for alignments with other known serine proteases. These features also allow one of 
skill in the art to make further structure-function correlations, again providing predictable 
correlations of regions and residues to conserve or change. As evidenced by the references 
cited in the specification and in the Information Disclosure Statements of record in this 
application and provided herein, a large body of knowledge pertaining to structure- function 
relationships of serine proteases was known in the art. In addition, the specification provides 
exemplary assays to assess serine protease activity, including a variety of substrates, for 
MTSP activity. One of skill in the art can readily and routinely test any MTSP family 



-47- 



Applicant : Madison et a/^* 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney *s 



DiRcetNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

member protease domain or a variant thereof for serine protease activity as a single chain 
protease. 

As taught in the specification as well as evidenced by the art of record, maintenance 
of the catalytic triad is sufficient to retain serine protease activity (e.g., see Carter et aL 
(Nature 332: 564-568 (1988), Exhibit 8 and Craik et aL (Science 237: 909-913 (1987), 
Exhibit 10)). Therefore, one of skill in the art could make and generate MTSP family 
member protease domains fi"om any MTSP known to one of skill in the art or identify 
protease domains in new MTSP family members. In the unlikely event that it was needed, 
protease activity could easily and routinely be confirmed using the assays provided in the 
application and known in the art. The routine manipulations to identify and isolate an MTSP 
protease domain as a single chain are known in the art. 

The experimentation necessary to isolate and use protease domains of MTSP 
polypeptides, as described above, is commonly practiced in this art and routine. "Enablement 
is not precluded by the necessity for some experimentation such as routine screening. 
Experimentation needed to practice the invention must not be undue experimentation. 'The key 
word is undue, not experimentation.' " In re Wands, 858 F.2d at 737-38 (quoting /n re 
Angstadt, 537 F.2d at 504; emphasis added; additional internal citations omitted). The 
Examiner admits that enzyme isolation techniques and recombinant and mutagenesis 
techniques are known and that it is routine to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims (see Final Office Action, Exhibit 2, page 
1 1). The art related to serine proteases also demonstrates that such experimentation is not 
undue. For example, Pearson et aL (Cabios Invited Review 13(4): 325-332 (1997) (Exhibit 
29)) explains that serine proteases share a conserved catalytic site, the catalytic triad and have 
several diagnostic motifs throughout the protein including a conserved protein fold and anti- 
parallel barrel structures that contribute to the function of the protease. Pearson et aL states 
that one could recognize proteins that have protease activity based on these conserved 
structures. Hence, generation of variants with serine protease activity is routine because one of 
skill in the art can use such conserved features as a guide for designing the location of 
variations to maintain these features. In addition, Cheah et aL (J. Biol. Chem. 265: 71 80-7187 
(1990), Exhibit 9) provides a demonstration of the predictability of generating variants of 
serine proteases based on an exemplary sequence. Cheah et aL uses known structural and 
functional information about trypsin-like serine proteases to obtain mutations in a rhinovirus 



-48- 



Applicant : Madison et 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



^Pbt No.: 1 19385-00028 / 1607 

APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

3C protease with predicted functional phenotypes. Thus, the art available at the time of filing, 
and before, dennonstrates that one of skill in the art could make variants of a serine protease in 
a predictable maimer. Therefore, one of skill in the art could make protease domains as single 
chains from an MTSP family member and also generate variants of MTSP polypeptides, using 
routine biotechnology techniques. Activity of the single chain protease domains and variants 
thereof could easily and routinely be confirmed using the assays provided in the application 
and known in the art. The routine manipulations to generate an MTSP single chain protease 
domain are not unpredictable. 

As discussed above, the issue is not whether the claims encompass variant MTSPs, but 
whether one of skill in the art in possession of an MTSP could prepare an isolated protease 
domain in which a free Cys is replaced with a another amino acid. The instant application 
identifies MTSP polypeptides and exemplifies that isolated serine protease domains possess 
serine protease activity as a single chain. Such demonstration of single chain activity had not 
been demonstrated before the instant application. The application provides adequate 
description to demonstrate that a common feature among the MTSP family members is the 
activity of a single chain form that includes the protease domain or catalytically active portions 
thereof in the absence of other MTSP portions. The application provides exemplary MTSP's 
that share about 40% sequence identity and possess such features. As discussed, the working 
examples, demonstrate reproducibility, producing 5 different protease domains. Therefore, the 
specification demonstrates that by following the teachings of the application, one of skill in the 
art can predictably identify, make and use substantially purified polypeptides consisting of an 
MTSP protease domain or catalytically active fragment thereof having serine protease activity 
as a single chain. 



There is nothing of record to suggest that production or use of any of the claimed 
polypeptides would require development of new procedures, techniques or excessive 
experimentation. Protein extraction, purification and synthesis methods have been used for 
decades. The specification provides a detailed working example for fermentation and isolated 
of an MTSP protease domain. As discussed above, MTSP family members are provided and 
described in the application and are well known in the art. The specification and the art 
describe conserved features that can be used to identify MTSP family members and the 
protease domain thereof. Such features include the catalytic triad, an N-terminal activation 



vii. The amount of experimentation required 



-49- 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




Attorney's E^WRt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



cleavage site and conserved cysteines that participate in disulfide bonding. If needed, assays 
for evaluating activity of the polypeptides are taught in the specification and are known in the 
art. Such assays are routine in this art and do not require excessive experimentation. 

The Examiner states that recombinant and mutagenesis techniques and enzyme 
isolation techniques are known and that it is routine to screen for multiple substitutions or 
multiple modifications as encompassed by the instant claims (see Final Office Action, Exhibit 
1, page 11). As discussed, mutagenesis methods are not required to make and use the 
polypeptides as claimed. The instant claims are directed to isolated protease domains of 
MTSP family members; one of skill in the art can identify and isolate the protease domain of 
any MTSP family member, identify a fi-ee Cys and replace it with another amino acid as 
described in the application. Hence, the claimed polypeptides can be synthesized, isolated and 
characterized using routine testing, and, if necessary, one of skill in the art can test 
polypeptides for catalytic activity by routine experimentation using the assays provided in the 
specification or known to those of skill in art. Appellant notes that "a considerable amount of 
experimentation is permissible, if it is merely routine . . In re Wands, 858 F.3d 731, 737. 

Conclusion 

In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact identification and isolation of protease domains in MTSP family members and 
preparation of single chain forms thereof as well as variants thereof is predictable and 
reproducibly demonstrated, it would not require undue experimentation for one of skill in the 
art to make and use polypeptides with the features as claimed. Hence, a consideration of the 
factors enumerated above leads to the conclusion that undue experimentation would not be 
required to make and use the isolated MTSP protease domains as claimed. Accordingly, 
Appellant respectfially submits that this rejection of claim 1 under 35 U.S.C. §112, first 
paragraph, is erroneous in law and fact and, therefore, should be reversed. 

For the reasons above, each of the dependent claims meets the written description 
requirement and are enabled and, in addition, additional reasons for each dependent claim are 
described below. 

Dependent Claim 11 

Claim 1 1 depends fi-om claim 1 and includes every limitation thereof. Claim 1 1 recites 
that the MTSP of the polypeptide of claim 1 is selected from among MTSPl, MTSP3, MTSP4 



-50- 



Applicant : Madison et at^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 




DBWcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 

and MTSP6. The arguments set forth above with respect to claim 1 are incorporated herein. 
The specification describes MTSPl and its protease domain, e.g.^ at pages 54-58. The 
specification describes MTSP3 its protease domain, e.g,^ at pages 58-60 and Example 1 (pages 
160-167). The specification describes MTSP4 its protease domain, e.g.y at pages 60-63 and 
Example 2 (pages 167-171. The specification describes MTSP6 its protease domain, e.g,^ at 
pages 63-64 and Example 3 (pages 171-176). The working examples demonstrate cloning of 
the protease domains, with replaced fi-ee Cys, for each of these. 

In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact that it is predictable to identify protease domains in MTSP family members 
and prepare single chain forms thereof as well as variants thereof, it would not require undue 
experimentation for one of skill in the art to make and use polypeptides with the features as 
claimed. Hence, a consideration of the factors enumerated above leads to the conclusion that 
undue experimentation would not be required to make and use the isolated MTSP protease 
domains of MTSPl, MTSP3, MTSP4 or MTSP6 of claim 11. Accordingly, Appellant 
respectfully submits that this rejection of claim 1 1 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 20 

Claim 20 depends from claim 1 and includes every limitation thereof The arguments 
set forth above with respect to claim 1 are incorporated herein. Claim 20 recites that the free 
Cys be replaced with a serine. The Examiner admits that recombinant and mutagenesis 
techniques are known in the art (see Final Office Action, Exhibit 2, page 11). The 
specification exemplifies the replacement of a free Cys in the protease domain with a serine 
residue. For example, see Example 1, which recites, on page 161, lines 4-9: 

To eliminate the free cysteine (at position 310 in SEQ ID No. 4) that exists when the 
protease domain of the MTSP3 protein is expressed or the zymogen is activated, the 
free cysteine at position 310 (see SEQ ID No. 3), which is Cys 122 if a chymotrypsin 
numbering scheme is used, was replaced with a serine. 

Similarly the working Example provide MTSP4s, MTSP6 and MTSPl with the fi-ee Cys 
replaced with serine. One of skill in the art readily can identify the protease domain of any 
MTSP family member, identify a fi-ee Cys and replace it with a serine residue. Such 
substitutions of amino acids are predictable and routine in the art. 

In light of the breadth of claim 20, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 



-51- 



Applicant 
Serial No. 
Filed 



Madison et a 
09/776,191 
February 2, 2001 



Attorney's O^^t No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



art, and the fact that it is predictable to replace a Cys with another amino acid residue, such as a 
serine residue, it would not require undue experimentation for one of skill in the art to make 
and use polypeptides with the features as claimed. Hence, a consideration of the factors 
enumerated above leads to the conclusion that undue experimentation would not be required to 
make and use the isolated MTSP protease domains of claim 20. Accordingly, Appellant 
respectfully submits that this rejection of claim 20 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 34 

Claim 34 depends from claim 1 and includes every limitation thereof. The arguments 
set forth above with respect to claim 1 are incorporated herein. Claim 34 recites the MTSP is 
selected from among corin, MTSPl, enteropeptidase, human airway trypsin-like protease 
(HAT), TMPRSS2, and TMPRSS4. For the reasons articulated above with respect to claim 
1, Appellant respectfully submits that the specification is enabling for preparation and use of 
a substantially purified single-chain polypeptide consisting only of a protease domain of a 
type-II membrane-type serine protease (MTSP) or a catalytically active fragment thereof as a 
single chain, where the MTSP protease domain or catalytically active fragment thereof has 
serine protease activity as a single chain and a free Cys in the protease domain is replaced 
with another amino acid. 

The specification specifically recites that the protease domains can be from any 
MTSP family member, including corin, MTSPl, enteropeptidase, human airway trypsin-like 
protease (HAT), TMPRSS2, and TMPRSS4. For example, see page 8, line 30 through page 

10, line 2, which recites: 

The protease domains provided herein include, but are not limited to, the single chain 
region having an N-terminus at the cleavage site for activation of the zymogen, through 
the C-terminus, or C-terminal truncated portions thereof that exhibit proteolytic activity 
as a single-chain polypeptide in in vitro proteolysis assays, of any MTSP family member, 
preferably from a mammal, including and most preferably human, that, for example, is 
expressed in tumor cells at different levels from non-tumor cells, and that is not 
expressed on an endothelial cell. These include, but are not limited to: MTSPl (or 
matriptase), MTSP3, MTSP4 and MTSP6. Other MTSP protease domains of interest 
herein, particularly for use in in vitro drug screening proteolytic assays, include, but are 
not limited to: corin (accession nos. AF133845 and AB013874; see, Yan et al. (1999) J. 
Biol, Chem. 274:14926-14938; Tomia et aL (1998) J. Biochem. 124:784-789; Uan et al, 
(2000) Proc. Natl. Acad. Sci. U.S.A. 97:8525-8529; SEQ ID Nos. 61 and 62 for the 
human protein); enteropeptidase (also designated enterokinase; accession no. U09860 for 
the human protein; see, Kitamoto et aL (1995) Biochem. 27: 4562-4568; Yahagi et aL 
(1996) Biochem. Biophys. Res. Conmiun. 219:806-812; Kitamoto et aL (1994) Proc, 
Natl. Acad. Sci. U.S.A. 91:7588-7592; Matsushima et aL (1994) J. Biol. Chem. 
269:19976-19982; see SEQ ID Nos. 63 and 64 for the human protein); human airway 



-52- 



Applicant : Madison e/ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DiRcet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

trypsin-like protease (HAT; accession no. AB002134; see Yamaoka et al. J. Biol. Chem. 
273:1 1894-11901; SEQ ID Nos. 65 and 66 for the human protein); hepsin (see, accession 
nos. Ml 8930, AF030065, X70900; Yamaoka et al, (1988) J Biol Chem 27: 1 1895-1 1901; 
Vu et al. (1997) J. Biol. Chem. 272:31315-31320; and Farley et al. (1993) Biochem. 
Biophys. Acta 1 173:350-352; SEQ ID Nos. 67 and 68 for the human protem); TMPRS2 
(see, Accession Nos. U75329 and AFl 13596; Paoloni-Giacobino et al. (1997) Genomics 
44:309-320; and Jacquinet et al. (2000) FEBS Lett. 468: 93-100; SEQ ID Nos. 69 and 70 
for the human protein) TMPRSS4 (see, Accession No. NM 016425; Wallrapp et al. 
(2000) Cancer 60:2602-2606; SEQ ID Nos. 71 and 72 for the human protein); and 
TADG-12 (also designated MTSP6, see SEQ ID Nos. 1 1 and 12; see International PCT 
application No. WO 00/52044, which claims priority to U.S. application Ser. No. 
09/261,416). 

The application describes the protease domain of MTSP family members corin, MTSPl, 
enteropeptidase, HAT, TMPRSS4 and TMPRSS2. Each of the specified MTSP family 
members is known and characterized in the art. . In view of the instant application teaching 
that such protease domains have activity as single chain polypeptides, the skilled artisan can 
readily isolate the protease domain of any of corin, MTSPl, enteropeptidase, human airway 
trypsin-like protease (HAT), TMPRSS2, and TMPRSS4 as a single chain and replace the free 
Cys with another £imino acid using routine techniques and if necessary test the isolated 
protease domain for the requisite activity. 

Appellant respectfully submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes the MTSP family 
members corin, enteropeptidase, HAT, TMPRSS4 and TMPRSS2, the breadth of claim 34, 
the extensive teachings and examples in the specification, the high level of skill of those in 
this art, the knowledge of those of skill in the art, and the fact that it is predictable to isolate a 
protease domain and replace a Cys with another amino acid residue, it would not require 
undue experimentation for one of skill in the art to make and use polypeptides with the 
features of claim 34. Hence, a consideration of the factors enumerated above leads to the 
conclusion that undue experimentation would not be required to make and use the isolated 
MTSP protease domains of claim 34. Accordingly, Appellant respectfully submits that this 
rejection of claim 34 under 35 U.S.C. §112, first paragraph, is erroneous in law and fact and, 
therefore, should be reversed. 
Dependent Claim 35 

Claim 35 is directed to a conjugate that includes a) a polypeptide of claim 1, and b) a 
targeting agent linked to the protein directly or via a linker, wherein the conjugate has serine 
protease activity. The arguments set forth above with respect to claim 1 are incorporated 
herein. The specification defines a "targeting agent" on page 38, lines 9-15, as: 



-53- 



Applicant : Madison et al^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DWretNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

any moiety, such as a protein or effective portion thereof, that provides specific 
binding of the conjugate to a cell surface receptor, which, preferably, internalizes 
the conjugate or MTSP portion thereof. A targeting agent may also be one that 
promotes or facilitates, for example, affinity isolation or purification of the 
conjugate; attachment of the conjugate to a surface; or detection of the conjugate 
or complexes containing the conjugate. 

The specification teaches that the conjugates can be prepared by chemical conjugation, 
recombinant DNA technology or combinations thereof, and provides detailed descriptions of 
chemical conjugation, including acid cleavable, photo-cleavable and heat sensitive linker 
technology and other linkers, preparation of fiision proteins, peptide linkers, conjugation to 
targeting agents, and adsorption, absorption and/or covalent bonding to a solid support (see 
e.g,, pages 123-131). For example, the specification teaches that for the fiision proteins, the 
peptide or fi*agment thereof is linked to either the N-terminus or C-terminus of the MTSP 
protein domain (e.g., see page 124, lines 25-26). The specification teaches that chemical 
conjugation also can be used to form conjugates, where the MTSP protein domain is linked 
via one or more selected linkers or directly to the targeting agent (e,g.y see page 126, lines 2- 
3). The specification describes various types of linkers and describes example of various 
linkers, including peptide linkers and chemical linkers, such as acid cleavable, photo- 
cleavable and heat cleavable linkers (e.g,, see pages 127-130). Methods of preparing protein 
conjugates are well known and routine in the art (e.g., see Brinkley, "A Brief Survey of 
Methods for Preparing Protein Conjugates with Dyes, Haptens, and Cross-linking Reagents" 
in Perspectives in Bioconjugate Chemistry (Claude Meares, ed. 1993, Chapter 4, pages 59- 
70, Exhibit 6). Hence, routine techniques can be used to conjugate isolated protease domains 
to a targeting agent. 

Appellant respectfully submits that, in view of the arguments set forth above with 
respect to claim 1 and the teaching in the specification, which describes conjugates of single- 
chain protease domains conjugated to a targeting agent, several different types of conjugation 
technologies for making the conjugates and exemplary conjugates, the breadth of claim 35, 
the high level of skill of those in this art, the knowledge of those of skill in the art, and the 
fact that it is routine and predictable to conjugate a polypeptide to a targeting agent, it would 
not require undue experimentation for one of skill in the art to make and use conjugates with 
the features of claim 35. Hence, a consideration of the factors enumerated above leads to the 
conclusion that undue experimentation would not be required to make and use the conjugates 
of claim 35. Accordingly, Appellant respectfully submits that this rejection of claim 35 under 



-54- 



Applicant : Madison e^fl 
Serial No. : 09/776,191 




Attorney's 



^^tNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Filed : February 2, 2001 
Customer Number: 77202 

35 U.S.C. §112, first paragraph, is erroneous in law and fact and, therefore, should be 
reversed. 

Dependent Claim 36 

Claim 36 depends from claim 35 and recites a conjugate that includes a targeting 
agent that permits i) affinity isolation or purification of the conjugate; ii) attachment of the 
conjugate to a surface; iii) detection of the conjugate; or iv) targeted delivery to a selected 
tissue or cell. The arguments set forth above with respect to claims 1 and 35 are incorporated 
herein. 

The specification recites, I., at page 14, lines 19-26 and page 123, line 30 through 
page 124, line 7, that the targeting agent of the conjugate permits affinity isolation or 
purification of the conjugate; attachment of the conjugate to a surface; detection of the 
conjugate; or targeted delivery to a selected tissue or cell. The specification teaches 
exemplary targeting agents, including tissue specific or tumor specific monoclonal 
antibodies, a growth factor or fi-agment thereof, such as FGF, EGF, PDGF, VEGF, cytokines, 
including chemokines, and other such agents, a protein or peptide fi-agment that contains a 
protein binding sequence, a nucleic acid binding sequence, a lipid binding sequence, a 
polysaccharide binding sequence, or a metal binding sequence, or a linker for attachment to a 
solid support (see, I., page 124, lines 8-17 and pages 131-136). The specification also 
describes the construction of affinity binding pairs for isolation and/or purification of the 
conjugate (e.g., see page 131, lines 5-37). Methods of preparing protein conjugates are well 
known and routine in the art (e.g,, see Brinkley, supra. Exhibit 6). Hence, routine, 
reproducible techniques well known to the skilled artisan can be used to conjugate isolated 
protease domains to a targeting agent. 

Appellant respectfiiUy submits that, in view of the arguments set forth above with 
respect to claims 1 and 35, and the teaching in the specification, which describes single-chain 
protease domains conjugated to a targeting agent and the use of such targeting agents for 
affinity isolation or purification of the conjugate or attachment of the conjugate to a surface 
or detection of the conjugate or targeted delivery to a selected tissue or cell, the breadth of 
claim 36, the high level of skill of those in this art, the knowledge of those of skill in the art, 
and the fact that it is routine and predictable to conjugate a polypeptide to a targeting agent, it 
would not require undue experimentation for one of skill in the art to make and use 
conjugates with the features of claim 36. Hence, a consideration of the factors enumerated 



-55- 



Applicant 
Serial No. 
Filed 




Madison et al. 
09/776,191 
February 2, 2001 



Attorney's DiHret No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



above leads to the conclusion that undue experimentation would not be required to make and 
use the isolated MTSP protease domains of claim 36. Accordingly, Appellant respectfully 
submits that this rejection of claim 36 under 35 U.S.C. §1 12, first paragraph, is erroneous in 
law and fact and, therefore, should be reversed. 
Dependent Claims 40 and 41 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. Claim 41 depends from claim 40 and recites that 
the polypeptides comprise an array. The arguments set forth above with respect to claim 1 
are incorporated herein. 

The specification describes solid supports and methods for immobilizing MTSP 
protein, such as a protease domain, to solid supports (e.g., see pages 131-136). For example, 
the specification teaches exemplary solid supports, including supports having any required 
structure and geometry, such as beads, pellets, disks, capillaries, hollow fibers, needles, solid 
fibers, random shapes, thin films and membranes (e.g., page 132, lines 26-29). The 
specification teaches that the solid support can be of any suitable material, such as inorganics, 
natural polymers, and synthetic polymers, including, cellulose, cellulose derivatives, acrylic 
resins, glass, silica gels, polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and 
acrylamide, polystyrene cross-linked with divinylbenzene, polyacrylamides, latex gels, 
polystyrene, dextran, polyacrylamides, rubber, silicon, plastics, nitrocellulose, celluloses, 
natural sponges and highly porous glasses (e,g,, page 134, lines 1-30). 

The specification teaches that a plurality of MTSP protease domains, including two or 
more protease domains, can be attached to a solid support (e.g., page 132, lines 4-8). The 
instant specification defines an array as a collection of elements containing three or more 
members and that, as in the case for an addressable array, the members of the array can be 
inmiobilized to discrete identifiable loci on the surface of a solid phase (see, e.g,^ page 35, 
lines 14-20). 

The specification teaches that the polypeptide can be linked to the solid support 
directly or via a linker (e.g,, page 132, lines 1-2). The specification describes various linking 
technologies that can be used to link the polypeptide to the solid support (e.g., page 135, lines 
1-30). These include reacting the protein with a reactive moiety on the solid support and the 
specification describes exemplary reactive moieties, including amino silane linkages, 
hydroxyl linkages, carboxysilane linkages, N-[3-(triethyoxy-silyl)propyl]phthelamic acid, 



-56- 



Applicant 
Serial No. 
Filed 



Madison et a 
09/776,191 
February 2, 2001 




Attorney's DWIfet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



bis-(2-hydroxyethyl)aminopropyltriethoxysilane, derivatized polystyrenes (page 133, lines 7- 
26), absorption and adsorption or covalent binding to the support, either directly or via a 
linker, such as through disulfide linkages, thioether bonds, and covalent bonds between free 
reactive groups, such as amine and thiol groups, known to those of skill in art (page 135, lines 
1 1-26). Linking a protein to a solid support is routine in the biotechnology arts (e.g.y see 
Means & Feeney, "Chemical Modifications of Proteins: History and Applications" in 
Perspectives in Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, 
Exhibit 23). The skilled artisan can select the appropriate conjugation chemistry based on the 
nature of the polypeptide and the solid support without undue experimentation and conjugate 
the protease domain to the solid support using routine techniques known in the art. 

In light of the breadth of claims 40 and 41, the extensive teachings in the specification 
with respect to solid supports and conjugating polypeptides thereto, including conjugating a 
plurality of isolated protease domains to a solid support, the high level of skill of those in this 
art, and the knowledge of those of skill in the art, Appellant respectfially submits that it would 
not require undue experimentation for one of skill in the art to make and use the solid supports 
of claim 40 nor the arrays of claim 41 , Hence, a consideration of the factors enumerated above 
leads to the conclusion that undue experimentation would not be required to make and use the 
solid supports comprising two or more polypeptides of claim 40 linked thereto either directly 
or via a linker of claim 1 13 or the arrays of claim 41 . Accordingly, Appellant respectfully 
submits that this rejection of claims 40 and 41 under 35 U.S.C. §112, first paragraph, is 
erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 42 

Claim 42 depends fi-om claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. The arguments set forth above with respect to 
claims 1, 40 and 41 are incorporated herein. The specification teaches that a plurality of 
MTSP protease domains can be attached to a solid support (e.g,, see page 132, lines 4-8). 
Linking a protein to a solid support is routine in the biotechnology arts (e.g.y see Means & 
Feeney, Chemical Modifications of Proteins: History and Applications in Perspectives in 
Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, Exhibit 23). 
Whether the protein to be conjugated to a solid support is a single species or multiple species 
of MTSP protease domain does not change the amount of experimentation required to form 
the claimed array. The skilled artisan readily can select the appropriate conjugation 



-57- 



Applicant : Madison et ^l^M 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



I^pit No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

chemistry based on the nature of the polypeptides and the solid support without undue 
experimentation and conjugate the polypeptide to the support using routine methods. 

In light of the breadth of claim 42, the extensive teachings in the specification with 
respect to solid supports and conjugating polypeptides thereto, including conjugating a 
plurality of isolated protease domains to a solid support, the high level of skill of those in this 
art, and the knowledge of those of skill in the art, Appellant respectfully submits that it would 
not require undue experimentation for one of skill in the art to make and use the arrays of claim 
42. Hence, a consideration of the factors enumerated above leads to the conclusion that undue 
experimentation would not be required to make and use the arrays of claim 42. Accordingly, 
Appellant respectfully submits that this rejection of claim 42 under 35 U.S.C. §112, first 
paragraph, is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claims 113 and 114 

Claim 113 recites a solid support comprising two or more polypeptides of claim 1 2 
linked thereto either directly or via a linker. Claim 1 14 depends from claim 1 13 and recites 
that the polypeptides comprise an array. Hence, each of claims 113 and 114 includes the 
polypeptide of claim 12 as an element. Claim 12 is not rejected under 35 U.S.C. §112. first 
paragraph . Accordingly, the Examiner admits that the specification is enabling for the 
subject matter of claim 12, which is directed to the substantially purified polypeptide of claim 
1 , wherein the MTSP protease domain consists of a sequence of amino acid residues selected 
from among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, 
the amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 
12. 

The specification describes solid supports and methods for immobilizing MTSP 
protein to solid supports (e.g., see pages 131-136). For example, the specification teaches 
exemplary solid supports, including supports having any required structure and geometry, 
such as beads, pellets, disks, capillaries, hollow fibers, needles, solid fibers, random shapes, 
thin films and membranes (e.g., page 132, lines 26-29). The specification teaches that the 
solid support can be of any suitable material, such as inorganics, natural polymers, and 
synthetic polymers, including, cellulose, cellulose derivatives, acrylic resins, glass, silica gels, 
polystyrene, gelatin, polyvinyl pyrrolidone, co-polymers of vinyl and acrylamide, polystyrene 
cross-linked with divinylbenzene, polyacrylamides, latex gels, polystyrene, dextran, 



-58- 



Applicant 
Serial No. 
Filed 




^^^^ 



Madison et ai^W Attorney's DWRet No.: 1 19385-00028 / 1607 

09/776, 1 9 1 APPELLANT'S APPEAL BRIEF 

February 2, 2001 



Customer Number: 77202 

polyacrylamides, rubber, silicon, plastics, nitrocellulose, celluloses, natural sponges and 
highly porous glasses (e.g^., page 134, lines 1-30). 

The specification teaches that a plurality of MTSP protease domains, including two or 
more protease domains, can be attached to a solid support (e.g^., page 132, lines 4-8). The 
instant specification defines an array as a collection of elements containing three or more 
members and that, as in the case for an addressable array, the members of the array can be 
immobilized to discrete identifiable loci on the surface of a solid phase (see page 35, lines 14- 
20. 

The specification teaches that the polypeptide can be linked to the solid support 
directly or via a linker {e,g,^ page 132, lines 1-2). The specification describes various linking 
technologies that can be used to link the polypeptide to the solid support (e.g.^ page 135, lines 
1-30). These include reacting the protein with a reactive moiety on the solid support. The 
specification describes exemplary reactive moieties, including amino silane linkages, 
hydroxyl linkages, carboxysilane linkages, N-[3-(triethyoxy-silyl)propyl]phthelamic acid and 
derivatized polystyrenes (page 133, lines 7-26). The specification also describes absorption 
and adsorption and covalent binding to the support, either directly or via a linker, such as via 
disulfide linkages or thioether bonds, and covalent bonds between fi-ee reactive groups, such 
as amine and thiol groups, known to those of skill in art (page 135, lines 1 1-26). Linking a 
protein to a solid support is routine in the biotechnology arts {e,g.^ see Means & Feeney, 
Chemical Modifications of Proteins: History and Applications in Perspectives in 
Bioconjugate Chemistry (Claude Meares, ed., 1993, Chapter 2, pages 10-20, Exhibit 23). 
The skilled artisan readily can select the appropriate conjugation chemistry based on the 
nature of the polypeptides and the solid support without undue experimentation and conjugate 
the polypeptide to the support using routine methods. 

In light of the breadth of claims 113 and 114, the extensive teachings in the 
specification with respect to solid supports and conjugating polypeptides thereto, including 
conjugating a plurality of isolated protease domains to a solid support, the high level of skill of 
those in this art, the knowledge of those of skill in the art, and the fact that the Examiner admits 
that the specification is enabling for the polypeptides of claim 12, Appellant respectfully 
submits that it would not require undue experimentation for one of skill in the art to conjugate 
the polypeptides of claim 12 to solid supports to make the solid supports of claim 113 and 
arrays of claim 114. Hence, a consideration of the factors enumerated above leads to the 



-59- 



Applicant : Madison et ai^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DURetNo.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

conclusion that undue experimentation would not be required to make and use the soUd 
supports comprising two or more polypeptides of claim 12 linked thereto either directly or via 
a linker of claim 1 13 or the arrays of claim 114. Accordingly, Appellant respectfully submits 
that this rejection of claims 113 and 114 under 35 U.S.C. §112, first paragraph, is erroneous in 
law and fact and, therefore, should be reversed. 



In light of the breadth of the claims, the extensive teachings and examples in the 
specification, the high level of skill of those in this art, the knowledge of those of skill in the 
art, and the fact that it is predictable to identify protease domains in MTSP family members 
and prepare single chain forms thereof as well as variants thereof, it would not require undue 
experimentation for one of skill in the art to make and use polypeptides with the features as 
claimed, or conjugates, solid supports or arrays that include the polypeptides. Hence, a 
consideration of the factors enumerated above leads to the conclusion that undue 
experimentation would not be required to make and use the subject matter as claimed. 
Accordingly, Appellant respectfully submits that this rejection of claims 1, 1 1, 20, 34-36, 40- 
42, 113 and 1 14 under 35 U.S.C. §112, first paragraph, is erroneous in law and fact and, 
therefore, should be reversed. 

3. REJECTION OF CLAIMS 1, 11-13, 20, 34-36, 40-42, 113 AND 114 UNDER 35 U.S.C. 
§102(b) - Takeuchi 

Claims 1, 11-13, 20, 34-36, 40-42, 113 and 114 are rejected under 35 U.S.C. §102(b) as 

being anticipated by Takeuchi, because the reference allegedly discloses "a polypeptide 

comprising a fi-agment consisting of a serine protease domain that is 100% identical to amino 

acids 615-855 of SEQ ID NO:2 of the instant invention" and discloses "a catalytically active 

polypeptide comprising the serine protease domain linked to a His-tag." The Examiner states 

that Takeuchi discloses that Cys at position 731 forms a disulfide bond with Cys 604 present in 

the pro domain (see Final Office Action, Exhibit 2, page 1 7). The Examiner alleges that the 

claim limitation "a fi:'ee Cys in the protease domain is replaced with another amino acid" and "a 

fi-ee Cys in the protease domain is replaced with a serine" is a product-by-process type 

limitation. The Examiner alleges that 

[t]he end result of the products of the claims is a serine protease domain or a 
serine protease domain having a serine residue. Whether the product of the 
claimed protein is obtained by replacing a fi"ee cysteine residue or not, the 
product is still the same because the instant claims may be produced by the 
recited modification or not. Therefore, there is no there a structure implied by 



Summary 



-60- 



Applicant : Madison eta^M 
Serial No. : 09/776.191 
Filed : February 2, 2001 




Attorney's 



^Pbt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

said limitations. Since the polypeptide of Takeuchi et aL consists of a protease 
domain of a MTSP and the MTSP protease domain has serine protease activity, 
the claims are anticipated by the prior art. Also, since the serine protease domain 
of Takeuchi et aL has a serine residue, claim 20 is also anticipated. 

The rejection respectfully is traversed. 

A. LEGAL STANDARDS - ANTICIPATION UNDER 35 U.S.C. § 102 

Anticipation is a factual determination that . .requires the presence in a single prior 
art disclosure of each and every element of a claimed invention." Lewmar Marine, Inc. v. 
Barient, Inc., 3 U.S.P.Q.2d 1766 (Fed. Cir. 1987). Moreover, "[a] claim is anticipated only if 
each and every element as set forth in the claim is found, either expressly or inherently 
described, in a single prior art reference." Verdegaal Bros. v. Union Oil of California, 2 
U.S,P.Q.2d 1051, 1053 (Fed. Cir. 1987) (emphasis added). 

Federal Circuit decisions have repeatedly emphasized the notion that anticipation 
cannot be found where less than all elements of a claimed invention are set forth in a 
reference. See, e.g. Transclean Corp. v. Bridgewood Services, Inc., 290 F.3d 1364 (Fed. Cir. 
2002). In this regard, a reference disclosing "substantially the same thing" is not enough to 
anticipate. Jamesbury Corp. v. Litton Indust. Prod., Inc., 756 F.2d 1556, 1560 (Fed. Cir. 
1985). A reference must clearly disclose each and every limitation of the claimed invention 
before anticipation may be found. 

Further, anticipation cannot be shown by combining more than one reference to show 
the elements of the claimed invention, hi re Saunders, 444 F.2d 599 (C.C.P.A. 1971). All 
elements of a claimed invention must be disclosed in one, solitary reference. As such, it is 
clear that a reference cannot be utilized to render a claimed invention anticipated without 
identical disclosure. 

B. THE REJECTION OF CLAIMS L 11-13. 20. 34-36. 40-42, 113 AND 114 
UNDER 35 U.S.C. S102fb) SHOULD BE REVERSED BECAUSE TAKEUCHI 
DOES NOT ANTICIPATE THE CLAIMED SUBJECT MATTER 

1. Disclosure of Takeuchi 

Takeuchi discloses a polypeptide that contains 855 amino acids and is designated MT- 
SP 1 . This protein has sequence identity with the full-length MTSPl set forth as SEQ ID NO:2 
of the instant application. Takeuchi discloses an expression vector that includes nucleic acid 
encoding the protease domain plus the pro-domain (see page 1 1055, left col., third full 
paragraph). Takeuchi discloses that its expression vector includes the mature protease domain 
and a small portion of the pro-domain and was designed to over-express the sequence encoding 



-61- 



Applicant : Madison et a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DBIi&t No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

a polypeptide containing amino acids 596-855 with a His-tag fusion to produce as a construct 
Met-Arg-Gly-Ser-His6-aa596-855 (page 1 1055, column 2, third full paragraph). Takeuchi 
discloses that amino acids Cys 604 and Cys 73 1 are disulfide bonded (see for example, at page 
1 1060, col. 1). Takeuchi discloses that its protease domain is disulfide bonded to the pro- 
domain region (see page 1 1055, column 2, third full paragraph and page 1 1058, col. 1 and page 
1 1060, col. 1, first paragraph) and that the pro-domain region remains bonded to the protease 
domain after activation (page 1 1058, lines 8-9). 

Takeuchi discloses that its "purified protease domain" includes the His-tag sequence 
and the pro-domain region bonded thereto, stating that a monoclonal antibody directed against 
the N-terminal Arg-Gly-Ser-His4 epitope is immunoreactive with its purified protein (see page 
1 1058). It is not an isolated single chain protease domain. It is a two chain structure and it 
includes amino acids in addition to the protease domain. Figure 3 cited by the Examiner as 
showing an isolated protease domain is a diagrammatic representation of the MTSPl protease 
domains; it by no means is an isolated protease domain. Furthermore, the figure depicts the 
disulfide bonds and does not show a fi-ee Cys in the protease domain, nor a fragment consisting 
of the protease domain. Page 1 1057, referenced by the Examiner as describing isolation of 
protease domain, does not do so. The polypeptide is expressed as a His-tagged polypeptide that 
forms a two-chain structure by virtue of the Cys-Cys disulfide bonds depicted in Figure 3. 
Furthermore, the paper discusses the activated His-tag extended polypeptide and describes its 
activity (see, e.g., Figure 6 and page 1 1057, col. 2). Takeuchi states that: 



the MT-SPl protease domain was expressed in E. coli as a His-tagged fusion and 
was purified from inclusion bodies under denaturing conditions by using metal- 
chelate affinity chromatography. . . . This denatured protein refolded when the 
urea was dialyzed from the protein. . . . N-terminal sequencing of the purified 
activated [i.e. the two-chain folded form] yielded the expected WGGT 



activation sequence. 

Thus, Takeuchi expresses a His-tagged form of the protein, which includes a protease domain 
and a pro-domain region, that forms a two chain structure when activation- cleaved. The 
sequenced molecule includes the His-tagged protease domain. Takeuchi does not disclose or 
contemplate an isolated polypeptide consisting of only the protease domain and does not 
mention replacement of any Cys with Ser (the Cys in its two-chain form is not fi-ee). 

Further, it is apparent from the disclosure that Takeuchi believes that a two-chain 
structure is a requisite for activity. Takeuchi discusses the need for activation cleavage and 
depicts the disulfide bond; there is no disclosure of a polypeptide in which there is a free Cys. 



-62- 



Applicant : Madison et 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



l^P&tNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Hence, there is no disclosure for replacing any free Cys with another amino acid, sxich as a 
serine. There is no mention of replacement of any amino acids in its polypeptide. 

Hence Takeuchi does not disclose isolation of a polypeptide consisting only of the 
protease domain of any MTSP, including an MTSPl . Its polypeptide includes a His-tag 
sequence; the active form of the enzyme includes a disulfide bond between the protease 
domain and a pro-domain region. In addition, the only isolation of a polypeptide including the 
protease domain (which includes the His-tag), was for sequencing purposes. 

2. Analysis 



In maintaining the rejection, the Examiner states on page 18 of the Final Office Action 
(Exhibit 2) that: 

[t]he limitation "a free Cys in the protease domain is replaced with another amino acid" 
and "a free Cys in the protease domain is replaced with a serine" is a product-by-process 
type limitation. The end result of the products of the claims is a serine protease domain or 
a serine protease domain having a serine residue. Whether the product of the claimed 
protein is obtained by replacing a free cysteine residue or not, the product is still the same 
because the instant claims may be produced by the recited modification or not. Therefore, 
there is no [] structure implied by said limitations. Since the polypeptide of Takeuchi et 
al. consists of a protease domain of a MTSP and the MTSP protease domain has serine 
protease activity, the claims are anticipated by the prior art. Also, since the serine protease 
domain of Takeuchi et al. has a serine residue, claim 20 is also anticipated. 

Appellant respectfully disagrees. Claim 1 recites that the isolated substantially purified 
polypeptide consists only of a protease domain or a smaller catalytically active portion of the 
protease as a single chain, and that a free Cys residue of the serine protease domain is replaced 
with another amino acid . This is not a "product-by-process type" limitation as alleged by the 
Examiner, but a limitation on the molecular structure of the single chain polypeptide. 

A product-by-process claim is a product claim that defines the claimed product in terms 
of the process by which it is made. In re Luck, 476 F.2d 650, 177 USPQ 523 (CCPA 1973); hi 
re Pilkington, 41 1 F.2d 1345, 162 USPQ 145 (CCPA 1969); In re Steppan, 394 F.2d 1013, 156 
USPQ 143 (CCPA 1967). Appellant respectfiilly submits that the instant claims do not define 
the product in terms of the process by which it is made. The specification teaches that a single- 
chain form of a serine protease domain has a free Cys residue. For example, page 58, lines 12- 



Muteins of the MTSPl proteins are provided. In the activated double chain molecule, 
residue 731 forms a disulfide bond with the Cys at residue 604. In the single chain form, 
the residue at 731 in the protease domain is free. Muteins in which Cys residues, 
particularly the free Cys residue (amino acid 731 in SEQ ID No. 2) in the single chain 
protease domain [is replaced] are provided. Other muteins in which conservative amino 



Independent Claim 1 



20 recites: 



-63- 




Applicant : Madison a/!^^ Attorney's DWret No.: 119385-00028/ 1607 

Serial No. : 09/776,191 APPELLANT'S APPEAL BRIEF 

Filed : February 2, 2001 
Customer Number: 77202 

acids replacements are effected and that retain proteolytic activity as a single chain are 
also provided. Such changes may be systematically introduced and tested for activity in in 
vitro assays, such as those provided herein. 

The Cys residue in the protease domain in the MTSP protein forms a disulfide bond with a Cys 
residue in pro-domain region, and autoactivation results in a polypeptide with a two-chain 
structure by virtue of the Cys-Cys disulfide bonds. Isolating the serine protease domain so that 
it is free from the pro-domain region results in unpaired Cys residues, because the single-chain 
isolated protease domain is not bonded to a Cys in another region of the protein, such as the 
pro-domain region. Hence, the isolated polypeptide consisting only of the protease domain 
will have a free Cys residue (a Cys residue that "does not form disulfide linkages with any 
other Cys residue in the protein," see page 10, lines 5-6 of the instant specification). Thus, the 
isolation of the protease domain results in a free Cys residue. Isolation of the protease domain 
does not result in a free Cys residue that is replaced with another amino acid. Further, the 
single chain form of the single chain protease domain can be made by recombinant expression 
in a vector, thus eliminating the need to "isolate" it from the expressed zymogen form of the 
enzyme. The isolated single chain form of the serine protease domain is not produced by 
replacing a free Cys residue with another amino acid . Hence, the claimed polypeptide is not 
defined in terms of the process by which it was made. Accordingly, the instant claims are not 
"product-by-process" claims. The polypeptides of Takeuchi et al, are two-chain polypeptides 
and do not contain a free Cys; hence they cannot contain a replaced free Cys. 

The limitation a free Cys residue of the serine protease domain is replaced with another 
amino acid is a structural limitation on the molecular architecture of the polypeptide. Cys 
residues readily form disulfide bonds due to the presence of the sulfliydryl group {e,g,, see 
Zubay, Biochemistry ((1983), pages 12-13, Exhibit 45). Other amino acid residues do not have 
this fiinctionality. For example, serine residues have a hydroxyl group instead of a sulfliydryl 
group and thus do not form disulfide bonds. Hence, replacing a free Cys residue in the 
protease domain of the polypeptide with another amino acid, such as a serine residue, as is 
claimed in claim 20, results in a protease domain that cannot form a disulfide bond with 
another region in the polypeptide. Hence, the recited limitation is a structural limitation. If the 
claims recited "wherein a sulfhydryl group is replaced with another fiinctionality" instead of 
"wherein a free Cys residue of the serine protease domain is replaced with another amino acid" 
there would be no question that the recitation is a structural limitation on the claimed 
compound. Because the recitation limits the structure of the polypeptide, the recited limitation 



-64- 



Applicant 
Serial No. 
Filed 



m 



Madison et a 
09/776,191 
February 2, 2001 



Attorney's fl^fct No.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



a free Cvs residue of the serine protease domain is replaced with another amino acid should be 
afforded patentable weight. "All words in a claim must be considered in judging the 
patentability of that claim against the prior art." In re Wilson, 424 F.2d 1382, 1385, 165 USPQ 
494, 496 (CCPA 1970). 

Appellant respectfully submits that Takeuchi does not disclose every element of the 
claimed subject matter. 

f 1) Free Cvs residue 

Takeuchi does not disclose a serine protease domain of an MTSP polypeptide that has a 
free Cys residue. Figure 3 of Takeuchi, for example, is a diagrammatic representation of the 
full-length MTSPl depicting the activated disulfide-bonded form of the enzyme, in which the 
Cys residue of the protease domain is part of a disulfide bond with a Cys residue in the pro- 
domain. Figure 4 of Takeuchi, which shows multiple sequence alignments of MTSPl 
structural motifs, identifies Cys residues that participate in disulfide bonds. All of the Cys 
residues in Figure 4 are shown as being disulfide bonded — there are no free Cys residues. 
Takeuchi discloses that its protease domain is disulfide bonded to the pro-domain region and 
remains bonded to the protease domain afl:er activation and thus Takeuchi does not disclose a 
protease domain having a free Cys residue. 

(2) Replacing a free Cvs residue with another amino acid 

There is no disclosure in Takeuchi with respect to replacement of any amino acid in its 
polypeptide. Takeuchi does not disclose replacing any amino acid in the serine protease 
domain with another amino acid. As discussed above, Takeuchi does not disclose a serine 
protease domain of an MTSP polypeptide that has a free Cys residue. Hence, Takeuchi does 
not disclose replacing a free Cys residue of the serine protease domain of an MTSP 
polypeptide with another amino acid. 

The Examiner's argument that "the serine protease domain of Takeuchi has a serine 
residue " and thus "claim 20 is also anticipated" is incorrect. Claim 20 does not recite a serine 
protease domain that has a serine residue. The claims recite that a free Cys residue of the 
serine protease domain of an MTSP polypeptide is replaced with another amino acid. There is 
no disclosure in Takeuchi of a protease domain of an MTSP polypeptide having a free Cys 
residue of the serine protease domain replaced with another amino acid. It is irrelevant 
whether other amino acid residues in the protease domain are serine residues. 



-65- 



Applicant 
Serial No. 
Filed 



Madison et a 
09/776,191 
February 2, 2001 




Attorney's OWRet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



3"^ An isolated^ substantially purified protease domain of an MTSP 
polypeptide 

Takeuchi discloses that its protease domain is disulfide bonded to the pro-domain 

region and remains bonded to the protease domain after activation. Takeuchi discloses that its 

"purified protease domain" includes the His-tag sequence, and states that a monoclonal 

antibody directed against the N-terminal Arg-Gly-Ser-His4 epitope is immunoreactive with its 

purified protein. Thus, the "purified protease domain" disclosed by Takeuchi includes 

additional amino acid residues in addition to the protease domain of the MTSPl . Neither page 

1 1057 nor Figure 3 of Takeuchi discloses a single chain polypeptide that consists only of the 

protease domain. As discussed above, the protease domain as expressed and isolated by 

Takeuchi includes additional amino acids. Takeuchi states that: 

N-terminal sequencing of the purified activated [i.e. the two-chain folded 
form] yielded the expected WGGT activation sequence. 

The purified activated polypeptide according to Takeuchi is a two chain polypeptide, and also, 
as expressed, includes the His-tag for purification. Figure 3, as noted, is a diagrammatic 
representation of the fiiU-length MTSPl depicting the activated disulfide-bonded form of the 
enzyme (in which the Cys that is replaced in the instant claims, is part of the disulfide bond). 
Hence, Takeuchi does not disclose a polypeptide consisting only of a protease domain or a 
smaller catalytically active portion of the protease domain. Thus, Takeuchi does not disclose 
an isolated, substantially purified protease domain of an MTSP polypeptide having a fi-ee Cys 
residue replaced with another amino acid. Hence, the disclosure of Takeuchi does not disclose 
every element of claim 1 . Therefore, Takeuchi does not anticipate claim 1 nor any claim 
dependent thereon. Accordingly, Appellant respectfially submits that the rejection of claim 1 as 
anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

For the reasons above, Takeuchi does not anticipate any of the dependent claims and, in 
addition, additional reasons why Takeuchi does not anticipate each dependent claim are 
described below. 

Dependent Claim 11 

Claim 1 1 depends fi"om claim 1 and recites that the MTSP is selected firom among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, fi-om 
which it depends. For the reasons discussed above with respect to claim 1, Takeuchi does not 
disclose every element of claim 1 1 and therefore does not anticipate claim 1 1. Accordingly, 



-66- 



Applicant : Madison et a/H 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



d^pit No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Appellant respectfully submits that the rejection of claim 1 1 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 1 2 depends from claim 1 and recites that the MTSP protease domain consists of a 
sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2 
(MTSPl protease domain), amino acids 205-437 of SEQ ID NO. 4 (MTSP3), the amino acid 
residues set forth as SEQ ID No. 6 (MTSP4) or as amino acids 217-443 in SEQ ID No. 12 
(MTSP6), where the free Cys is replaced with Ser. Claim 12 includes every limitation of claim 
1, from which it depends. For the reasons discussed above with respect to claim 1, Takeuchi 
does not disclose every element of claim 12 and therefore does not anticipate claim 12. 
Accordingly, Appellant respectfully submits that the rejection of claim 12 as anticipated by 
Takeuchi is erroneous in law and fact and, therefore, should be reversed. 



Claim 1 3 depends from claim 1 and recites that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
For the reasons discussed above with respect to claim 1, Takeuchi does not disclose every 
element of claim 13 and therefore does not anticipate claim 13. Accordingly, Appellant 
respectfully submits that the rejection of claim 13 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 



Claim 20 depends from claim 1 and recites that a free Cys in the protease domain is 
replaced with a serine. Claim 20 includes every limitation of claim 1, from which it depends. 
As discussed above, Takeuchi does not disclose a serine protease domain of an MTSP 
polypeptide that has a free Cys residue. There is no disclosure in Takeuchi with respect to 
replacement of any amino acids in its polypeptide. Takeuchi does not disclose replacing any 
amino acid in the serine protease domain with another amino acid. Takeuchi does not disclose 
replacing a free Cys residue of the serine protease domain of an MTSP polypeptide with a 
serine. Thus, for these reasons and the reasons discussed above with respect to claim 1 , 
Takeuchi does not disclose every element of claim 20 and therefore does not anticipate claim 



Dependent Claim 12 



Dependent Claim 13 



Dependent Claim 20 



-67- 



Applicant 
Serial No. 
Filed 




Madison et al 
09/776,191 
February 2, 2001 



Attorney's DWret No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



20. Accordingly, Appellant respectfully submits that the rejection of claim 20 as anticipated by 
Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 34 

Claim 34 depends from claim 1 and recites that the MTSP is selected from among corin, 
MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1, from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, Takeuchi does not disclose every element of 
claim 34 and therefore does not anticipate claim 34. Accordingly, Appellant respectfiilly 
submits that the rejection of claim 34 as anticipated by Takeuchi is erroneous in law and fact 
and, therefore, should be reversed. 

Dependent Claim 40 

Claim 40 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. Takeuchi does not disclose an isolated single- 
chained polypeptide consisting only of an MTSP protease domain in which a free Cys has 
been replaced with another amino acid nor conjugating two or more such isolated protease 
domains to a solid support. Hence, there is no disclosure in Takeuchi of a solid support that 
includes two or more isolated single-chained polypeptides consisting only of an MTSP 
protease domain in which a free Cys was replaced with another amino acid. Thus, for these 
reasons and the reasons discussed above with respect to claim 1 , Takeuchi does not disclose 
every element of claim 40 and therefore does not anticipate claim 40. Accordingly, Appellant 
respectfully submits that the rejection of claim 40 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 

Dependent Claim 41 

Claim 41 recites a solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker where the polypeptides comprise an array. The 
specification defines an array as a collection of elements containing three or more members. 
As discussed above, Takeuchi does not disclose isolating the protease domain and preparing 
it as a single chain and modifying the single-chain polypeptide that has a free Cys residue by 
replacing the free Cys residue with another amino acid. Takeuchi does not disclose a solid 
support that includes three or more isolated single-chained polypeptides consisting only of an 
MTSP protease domain in which a free Cys was replaced with another amino acid. Thus, for 
these reasons and the reasons discussed above with respect to claim 1 , Takeuchi does not 



-68- 



Applicant : Madison et al^t 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DWiRtNo,: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

disclose every element of claim 41 and therefore does not anticipate claim 41. Accordingly, 
Appellant respectfully submits that the rejection of claim 41 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 42 depends from claim 41 and recites that the array comprises polypeptides 
having different MTSP protease domains. As discussed above, Takeuchi does not disclose 
isolating the protease domain and preparing it as a single chain nor replacing any amino acid 
in the MTSP polypeptide with another amino acid, Takeuchi does not disclose modifying a 
single-chain polypeptide that has a free Cys residue by replacing the free Cys residue with 
another amino acid. Takeuchi does not disclose a solid support that includes three or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. Takeuchi does not disclose a solid support 
that includes three or more isolated protease domains in which a free Cys was replaced with 
another amino acid, where the protease domains are from different MTSPs. Thus, for these 
reasons and the reasons discussed above with respect to claim 1, Takeuchi does not disclose 
every element of claim 42 and therefore does not anticipate claim 42. Accordingly, 
Appellant respectfiilly submits that the rejection of claim 42 as anticipated by Takeuchi is 
erroneous in law and fact and, therefore, should be reversed. 



Claim 113 recites a solid support comprising two or more polypeptides of claim 12 
linked thereto either directly or via a linker. Claim 1 2 depends from claim 1 and specifies 
that the MTSP protease domain consists of a sequence of amino acid residues selected from 
among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the 
amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 
Claim 12 includes every limitation of claim 1, from which it depends. 

Takeuchi does not disclose isolating the protease domain and preparing it as a single 
chain. Takeuchi does not disclose replacing any amino acid in the MTSP polypeptide with 
another amino acid, and does not disclose modifying a single-chain polypeptide that has a free 
Cys residue by replacing the free Cys residue with another amino acid. There is no disclosure 
in Takeuchi of a solid support that includes two or more isolated single-chained polypeptides 
consisting only of an MTSP protease domain in which a free Cys was replaced with another 
amino acid. Thus, for these reasons and the reasons discussed above with respect to claim 1 



Dependent Claim 42 



Dependent Claim 113 



-69- 



Applicant 
Serial No. 
Filed 



m 



Madison et a 
09/776,191 
February 2, 2001 




Attorney's X^mkx No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



and claim 12, Takeuchi does not disclose every element of claim 113 and therefore does not 
anticipate claim 113. Accordingly, Appellant respectfully submits that the rejection of claim 
1 13 as anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 114 

Claim 1 14 depends from claim 113 and recites that the polypeptides comprise an 
array. As discussed above, Takeuchi does not disclose isolating the protease domain and 
preparing it as a single chain. Takeuchi does not disclose replacing any amino acid in the 
MTSP polypeptide with another amino acid, and does not disclose modifying a single-chain 
polypeptide that has a free Cys residue by replacing the free Cys residue with another amino 
acid. There is no disclosure in Takeuchi of a solid support that includes three or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. Thus, for these reasons and the reasons 
discussed above with respect to claim 1 and claim 113, Takeuchi does not disclose every 
element of claim 114 and therefore does not anticipate claim 1 14. Accordingly, Appellant 
respectfully submits that the rejection of claim 1 14 as anticipated by Takeuchi is erroneous in 
law and fact and, therefore, should be reversed. 

Summary 

Appellant respectfully submits that, in light of the above, the Examiner has failed to 
establish claims 1,11-13, 20, 34-36, 40-42, 1 13 and 1 14 as anticipated by Takeuchi under 35 
U.S.C. §102(b). Accordingly, Appellant respectfully submits that the rejection of claims 1,11 
13, 20, 34-36, 40-42, 113 and 1 14 as anticipated by Takeuchi is erroneous in law and fact and, 
therefore, should be reversed. 

. THE REJECTION OF CLAIMS 1, 11-13 AND 34 UNDER 35 U.S.C. §102(e)/103(a) 

In the Final Office Action (Exhibit 1), on page 19, claims 1, 11-13 and 34 are rejected 
as obvious under 35 U.S.C. §103(a)over O'Brien and there is no mention of a rejection under 
35 U.S.C. § 102(e), although the rejection is set forth under the heading "Claim Rejections - 
35 use §102/103." In the paragraph bridging pages 20 and 21 of the Final Office Action, 
however, the Examiner states that the claims are anticipated by O'Brien. Accordingly, 
Appellant separately traverses the rejection of claims 1, 1 1-13 and 34 under 35 U.S.C. 
§ 102(e) as anticipated by O'Brien and the rejection of claims 1, 11-13 and 34 as obvious 
under 35 U.S.C. §103(a)over O'Brien. 



-70- 



Applicant : Madison et a/.^| 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 




DVEt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 

The 102(e) rejection 

The Examiner alleges that the limitation " a free Cvs residue of the serine protease 
domain is replaced with another amino acid " is a "product-by-process type" limitation, and that 
"whether the product is obtained by replacing a free cysteine residue or not, the product is still 
the same because the instant claims may be produced by the recited modification or not" and 
concludes that "there is no structure implied by said limitations. The Final Office Action 
concludes that the disclosed molecules in O'Brien anticipate the claimed subject matter. 

A. LEGAL STANDARDS - ANTICIPATION UNDER 35 U.S.C. § 102(b) 

The law with respect to anticipation under 35 U.S.C. § 102(a) is discussed above. 

B. THE REJECTION OF CLAIMS 1. 11-13 AND 34 UNDER 35 U.S.C. S102(b) 
SHOULD BE REVERSED BECAUSE O^BRIEN DOES NOT ANTICIPATE 
THE CLAIMED SUBJECT MATTER 

1. The disclosure of O'Brien 

O'Brien discloses a protein identified therein as TADG-15, which is an MTSPl variant, 
with a sequence of amino acids as set forth as SEQ ID NO:2. The reference also discloses a 
comparison of the amino acid sequence of the protease domain of TADG-15 (SEQ ID NO: 14) 
with other serine protease catalytic domains (see Figure 2). O'Brien discloses that TADG-15 is 
a highly over-expressed gene in tumors and suggests that TADG-15 is novel in its component 
structure of domains because it has a protease catalytic domain that could be released in vivo 
and used as a diagnostic in vivo and that potentially could be a target for therapeutic 

intervention (col. 15, lines 31-38): 

TADG-15 is a highly overexpressed gene in tumors. It is expressed in a 
limited number of normal tissues, primarily tissues that are involved in either 
uptake or secretion of molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a protease catalytic 
domain which could be released and used as a diagnostic and which has the 
potential for a target for therapeutic intervention. 

Thus, O'Brien states that the TADG-15 protease domain possibly could be released in vivo 
and serve as a therapeutic target, not as a therapeutic. O'Brien does not disclose, teach or 
suggest or mention or even hint at isolating the protease domain nor provide any disclosure 
that isolation of a protease domain would result in a fi"ee Cys that should be replaced. 

O'Brien does not disclose isolation of the protease domain as a single-chain 
polypeptide that consists only of the protease domain as a single chain. O'Brien does not 
disclose a protease domain of an MTSP polypeptide that has a fi*ee Cys residue, or replacing 



-71- 



Applicant : Madison et ai^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



DVet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

a free Cys residue of a serine protease domain of an MTSP polypeptide with another amino 
acid. 

2. ANALYSIS 

Independent Claim 1 

Claim 1 recites that the isolated substantially purified polypeptide consists of a 
protease domain or a smaller catalytically active portion of the protease as a single chain, and 
that a free Cys residue of the serine protease domain is replaced with another amino acid. 
O'Brien does not disclose an isolated polypeptide that consists only of a protease domain or a 
smaller catalytically active portion of the protease as a single chain. O'Brien does not 
disclose an isolated single-chain protease domain of an MTSP polypeptide having a free Cys 
residue, or replacing a free Cys residue of an isolated single-chain serine protease domain of 
an MTSP polypeptide with another amino acid. In the previous Office Action, mailed April 
21, 2006 (Exhibit 46, at page 20, lines 6-7), the Examiner states that O'Brien does not 
disclose a protease domain that has been purified . Hence, O'Brien does not disclose every 
element of claim 1 . 

In addition, as discussed above, O'Brien does not disclose an isolated protease 
domain of an MTSP. Stating that such protease domain could be released in vivo and used as 
a diagnostic target does not constitute a disclosure of an isolated single chain protease 
domain, and certainly does not constitute disclosure of an isolated protease domain in which 
a free Cys is replaced. 

In maintaining the rejection, the Examiner states on page 20 of the Final Office Action 
(Exhibit 1 ) that 

[t]he limitation "a free Cys in the protease domain is replaced with another amino acid" is 
a product-by-process type limitation. The end result of the products of the claims is a 
serine protease domain. Whether the product of the claimed protein is obtained by 
replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not. Therefore, there is no there a 
structure implied by said limitations. Since the polypeptide of O'Brien ei al. consists of a 
protease domain of a MTSP and the MTSP protease domain has serine protease activity, 
the claims are anticipated by the prior art. 

Appellant respectfially submits that a free Cys residue of the serine protease domain is 
replaced with another amino acid is not a "product-by-process type" limitation as alleged by 
the Examiner, but a limitation on the molecular structure of the single chain polypeptide. A 
product-by-process claim is a product claim that defines the claimed product in terms of the 
process by which it is made. Appellant respectfully submits that the instant claims do not 



-72- 



Applicant : Madison et 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^ltNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

define the product in terms of the process by which it is made. As taught in the specification 
{e,g,, see page 58, Unes 12-20, which is reproduced above in the traverse of the rejection over 
Takeuchi), the Cys residue in the protease domain in the MTSP protein forms a disulfide bond 
with a Cys residue in pro-domain region, and autoactivation results in a polypeptide with a 
two-chain structure by virtue of the Cys— Cys disulfide bonds. Isolating the serine protease 
domain so that it is fi-ee fi-om the pro-domain region results in unpaired Cys residues, because 
the Cys residue in the protease domain of the single-chain isolated protease domain is not 
bonded to a Cys in another region of the protein, such as the pro-domain region. Thus, the 
isolated polypeptide consisting only of the protease domain will have a fi"ee Cys residue (a Cys 
residue that "does not form disulfide linkages with any other Cys residue in the protein," see 
page 10, lines 5-6 of the instant specification). Thus, the isolation of the protease domain 
results in a firee Cys residue. Isolation of the protease domain does not result in a fi*ee Cys 
residue being replaced with another amino acid. Further, the single chain form of the single 
chain protease domain can be made by recombinant expression in a vector, thus eliminating the 
need to "isolate" it fi"om the expressed zymogen form of the enzyme. The isolated single chain 
form of the serine protease domain is not produced bv replacing a fi'ee Cvs residue . Hence, the 
claimed polypeptide is not defined in terms of the process by which it was made. Accordingly, 
the instant claims are not "product-by-process" claims. 

The limitation a free Cvs residue of the serine protease domain is replaced with another 
amino acid is a structural limitation on the molecular architecture of the polypeptide. Cys 
residues readily form disulfide bonds due to the presence of the sulfhydryl group (e.g., see 
Zubay, Biochemistry ((1983), pages 12-13, Exhibit 45). Other amino acid residues do not have 
this functionality. For example, serine residues have a hydroxyl group instead of a sulfhydryl 
group and thus do not form disulfide bonds. Hence, replacing a free Cys residue in the 
protease domain of the polypeptide with another amino acid, such as a Ser residue, as is 
claimed in claim 20, results in a protease domain that cannot form a disulfide bond with 
another region in the polypeptide. Hence, the recited limitation is a structural limitation. 
Because the recitation limits the structure of the polypeptide, the recitation should be afforded 
patentable weight. "All words in a claim must be considered in judging the patentability of 



that claim against the prior art." In re Wilson, 424 F.2d 1382, 1385, 165 USPQ 494, 496 
(CCPA 1970). 



-73- 



Applicant : Madison et al. 
Serial No. : 09/776,191 




Attorney's 



DVet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Filed : February 2, 2001 
Customer Number: 77202 

Hence, O'Brien does not disclose every element of claim 1 . Therefore O'Brien does 
not anticipate claim 1 nor any claim dependent thereon. Accordingly, Appellant respectfully 
submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law and fact 
and, therefore, should be reversed. 

For the reasons above, O'Brien does not anticipate any of the dependent claims and, 
further, additional reasons why O'Brien does not anticipate each dependent claim are described 
below. 



Claim 1 1 depends from claim 1 and specifies that the MTSP is selected from among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, from 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien 
does not disclose every element of claim 1 1 and therefore does not anticipate claim 1 1 . 
Accordingly, Appellant respectfully submits that the rejection of claim 1 1 as anticipated by 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 



Claim 12 depends from claim 1 and specifies that the MTSP protease domain consists 
of a sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 
2, amino acids 205-437 of SEQ ID NO. 4, the amino acid residues set forth as SEQ ID No. 6 or 
as amino acids 217-443 in SEQ ID No. 12. Claim 12 includes every limitation of claim 1, fi-om 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien does 
not disclose every element of claim 12 and therefore does not anticipate claim 12, 
Accordingly, Appellant respectfully submits that the rejection of claim 12 as anticipated by 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 



Claim 1 3 depends from claim 1 and specifies that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
Thus, for the reasons discussed above with respect to claim 1, O'Brien does not disclose every 
element of claim 13 and therefore does not anticipate claim 13. Accordingly, Appellant 



Dependent Claim 11 



Dependent Claim 12 



Dependent Claim 13 



-74- 



Applicant : Madison a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 




D^Et No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 

respectfully submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 



Claim 34 depends from claim 1 and specifies that the MTSP is selected from among 
corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1, from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, O'Brien does not disclose every element of 
claim 34 and therefore does not anticipate claim 34. Accordingly, Appellant respectfully 
submits that the rejection of claim 1 as anticipated by O'Brien is erroneous in law and fact and, 
therefore, should be reversed. 



Appellant respectfully submits that, in light of the above, the Examiner has failed to 
establish claims 1, 11-13 and 34 as anticipated under 35 U.S.C. §102(b) by O'Brien. 
Accordingly, Appellant respectfiilly submits that the rejection of claims 1,11-13 and 34 as 
anticipated by O'Brien is erroneous in law and fact and, therefore, should be reversed. 

5. THE REJECTION OF CLAIMS 1, 11-13 AND 34 AND CLAIMS 35, 36, 40-42, 113 
AND 114 UNDER 35 U.S.C. §103(a) - O'Brien 

Claims 1, 1 1-13 and 34, as well as claims 35, 36, 40-42, 113 and 1 14, are rejected as 
unpatenable over O'Brien under 35 U.S.C. § 103(a) because O'Brien allegedly teaches a 
method of expressing polypeptides in host cells and that it teaches that the protease domain 
could be released from the polypeptide and used as a diagnostic that has the potential for 
therapeutic intervention. Thus, the Final Office Action concludes that it would have been 
obvious to one of skill in the art to express the protease domain disclosed as SEQ ID NO: 14 
by O'Brien and purify the polypeptide. It is alleged that the motivation to make such 
polypeptides is the disclosed use as a diagnostic for therapeutic intervention. Further, it is 
alleged that one of ordinary skill in the art would have had a reasonable expectation of 
success since the expression of heterologous polypeptides was routine in the art and O'Brien 
teaches how to express heterologous polypeptides. The Examiner also alleges that the 
limitation " a free Cys residue of the serine protease domain is replaced with another amino 
acid" is a "product-by-process type" limitation, and that "whether the product is obtained by 
replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not" and concludes that "there is no 
structure implied by said limitations. 



Dependent Claim 34 



Summary 



-75- 



Applicant : Madison et a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^lt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

The rejection respectfully is traversed. As discussed above, O'Brien et al speculates 
that the protease domain of TAG- 1 5 could be released in vivo and, if it turns out that it is 
released in vivo, the protease domain could serve as therapeutic target. This is not a teaching 
or suggestion or even hint for producing the protease domain in vitro and using it as a 
therapeutic (not a target) or as a diagnostic reagent )not as a target. There is nothing taught 
or suggested in O'Brien et al w^ould have led one of ordinary skill in the art to isolate the 
protease domain (or a catalytically active fragment there) and replace what ends up as a free 
Cys with another amino acid. 

A. LEGAL STANDARDS - OBVIOUSNESS UNDER 35 U.S,C. § 103(a) 

For prima facie obviousness of claimed subject matter to be established under 35 U.S.C. 
§103, all the claim limitations must be taught or suggested by the prior art. In re Royka, 490 F.2d 
981, 180 USPQ 580 (CCPA 1974). This principle of U.S. law regarding obviousness was not 
altered by the recent Supreme Court holding in KSR Intemational Co. v. Teleflex Inc., 127 S.Ct. 
1727, 82 USPQ2d 1385 (2007). In KSR, the Supreme Court stated that "Section 103 forbids 
issuance of a patent when 'the differences between the subject matter sought to be patented and 
the prior art are such the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter 
pertains.'" KSR Int'l Co. v. Teleflex Inc., 127 S.Ct. 1727, 1734, 82 USPQ2d 1385, 1391 (2007). 

The mere fact that prior art may be modified to produce the claimed product does not 
make the modification obvious unless the prior art suggests the desirability of the 
modification. In re Fritch, 23 U.S.P.Q.2d 1780 (Fed. Cir. 1992); see, also, In re Papesch, 315 
F.2d 381, 137 U.S.P.Q. 43 (CCPA 1963). Further, that which is within the capabilities of one 
skilled in the art is not synonymous with that which is obvious. Ex parte Gerlach, 212 USPQ 
471 (Bd. APP. 1980). 

Furthermore, the Supreme Court in KSR took the opportunity to reiterate a second 
long-standing principle of U.S. law: that a holding of obviousness requires the fact finder 
(here, the Examiner), to make explicit the analysis supporting a rejection under 35 U.S.C. 103, 
stating that "rejections on obviousness cannot be sustained by mere conclusory statements; 
instead, there must be some articulated reasoning with some rational underpinning to support 
the legal conclusion of obviousness. Id. at 1740-41, 82 USPQ2d at 1396 (citing In re Kahn, 
441 F.3d 977, 988, 78 USPQ2d 1329, 1336 (Fed. Cir. 2006)). 



-76- 



Applicant 
Serial No. 
Filed 



m 



Madison et a 
09/776,191 
February 2, 2001 




Attorney's SVbt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



While the KSR Court rejected a rigid application of the teaching, suggestion, or 

motivation ("TSM") test in an obviousness inquiry, the Court acknowledged the importance 

of identifying "a reason that would have prompted a person of ordinary skill in the relevant 

field to combine the elements in the way the claimed new invention does" in an obviousness 

determination. KSR, 127 S. Ct. at 1731, The court stated in dicta that, where there is a 

"market pressure to solve a problem and there are a finite number of 
identified, predictable solutions, a person of ordinary skill has good reason to 
pursue the known options within his or her technical grasp. If this leads to the 
anticipated success, it is likely the product not of innovation but of ordinary 
skill and common sense. In that instance the fact that a combination was 
obvious to try might show that it was obvious under § 103." 

In apost-KSR decision, PharmaStem Therapeutics. Inc. v. ViaCell. Inc., 491 F.3d 

1342 (Fed. Cir. 2007), the Federal Circuit stated that: 

an invention would not be invalid for obviousness if the inventor would have 
been motivated to vary all parameters or try each of numerous possible 
choices until one possibly arrived at a successful result, where the prior art 
gave either no indication of which parameters were critical or no direction as 
to which of many possible choices is likely to be successful. Likewise, an 
invention would not be deemed obvious if all that was suggested was to 
explore a new technology or general approach that seemed to be a promising 
field of experimentation, where the prior art gave only general guidance as to 
the particular form of the claimed invention or how to achieve it.. 

Furthermore, KSR has not overruled existing case law. See In re Papesch, (315 F.2d 

381, 137 USPQ 43 (CCPA 1963)), In re Dillon, 919 F.2d 688, 16 USPQ2d 1897 (Fed. Cir. 

1991), and In re Deuel (51 F.3d 1552, 1558-59, 34 USPQ2d 1210, 1215 (Fed. Cir. 1995)). "In 

cases involving new compounds, it remains necessary to identify some reason that would have 

led a chemist to modify a known compound in a particular manner to establish prima facie 

obviousness of a new claimed compound." Takeda v. Alphapharm, 492 F.3d 1350 (Fed. Cir. 

2007). 

The mere fact that prior art may be modified to produce what is claimed does not 
make the modification obvious unless the prior art suggests the desirability of the 
modification, hi re Fritch, 23 U.S.P.Q.2d 1780 (Fed. Cir. 1992); see, also. In re Papesch, 315 
F.2d 381, 137 U.S.P.Q. 43 (CCPA 1963). hi addition, if the proposed modification or 
combination of the prior art would change the principle of operation of the prior art invention 
being modified, then the teachings of the references are not sufficient to render the claims 
prima facie obvious. In re Ratti, 270 F.2d 810, 123 USPQ 349 (CCPA 1959). 



-77- 



Applicant 
Serial No. 
Filed 




Madison et al 
09/776,191 
February 2, 2001 



1^8ll 



Attomey*s D^Et No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



The disclosure of the appUcant cannot be used to hunt through the prior art for the 
claimed elements and then combine them as claimed. In re Laskowski, 871 F.2d 115, 117, 10 
USPQ2d 1397, 1398 (Fed. Cir. 1989). "To imbue one of ordinary skill in the art with 
knowledge of the invention in suit, when no prior art reference or references of record convey 
or suggest that knowledge, is to fall victim to the insidious effect of a hindsight syndrome 
wherein that which only the inventor taught is used against its teacher" W,L. Gore & 
Associates, Inc. v. Garlock Inc., 721 F.2d 1540, 1553, 220 USPQ 303, 312-13 (Fed. Cir, 1983). 

B. THE REJECTION OF CLAIMS 1. 11-13, 34-36. 40-42, 113 AND 113 UNDER 35 
U>S.C. S103fb) SHOULD BE REVERSED BECAUSE THE EXAMINER HAS 
FAILED TO ESTABLISH A PRIMA FACIE CASE OF OBVIOUSNESS 

1. The teachings of O'Brien 

The teachings of O'Brien are discussed above. O'Brien states that: 

TADG-15 is a highly overexpressed gene in tumors. It is expressed in a 
limited number of normal tissues, primeirily tissues that are involved in either 
uptake or secretion of molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a protease catalytic 
domain which could be released and used as a diagnostic and which has the 
potential for a target for therapeutic intervention. 

O'Brien is speculating that the protease domain could be released in vivo and serve as a 
therapeutic target not as a therapeutic agent or diagnostic reagent, O'Brien does not teach 
or suggest that the protease domain exists even in vivo as a single chain, and does not teach or 
suggest isolating it. In this passage, noted by the Examiner, O'Brien is discussing the 
expression of TADG-15 in tumors and other tissues and indicates that it is expressed on the 
surface of cells. Because of its structure, the protease domain could be presented on the 
surface of cells in vivo, and, thus, "could be released." Since it is over expressed in tumors, if 
released in v/vo, it could serve as a diagnostic marker indicating the presence of tumor cells. 
Use of its presence in vivo as a diagnostic marker for detection of tumors and/or as a 
therapeutic target is not a teaching or suggestion or hint for isolating the protease domain, nor 
for producing it as a single-chain polypeptide, nor for modifying it by replacing what would be 
a free Cys in a single chain form with another amino acid. 

Thus, O'Brien does not state or hint that the isolated single chain protease domain 
could be used as therapeutic or as a diagnostic, and certainly does not teach or suggest then 
modifying it by replacing a free Cys in the single chain polypeptide with another amino acid. 
Such teaching does not constitute even a hint or suggestion for isolation or production of a 
polypeptide consisting only of the single-chain protease domain of an MTSP, nor of a single 



-78- 



Applicant : Madison et a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 




d^lt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 

chain protease domain in which the free Cys (which results only by virtue of it being a single 

chain) is replaced with another amino acid. 

2. Analysis - the Examiner has failed to set forth a case of prima 
facie obviousness. 

Independent Claim 1 

O'Brien does not teach or suggest an isolated single chain protease domain of an 
MTSP polypeptide nor one in which a free Cys residue is replaced with another amino acid, 
such as a serine. There is no teaching or suggestion in O'Brien for preparing a polypeptide 
consisting only of a single-chain protease domain and modifying by replacing what is a free 
Cys in the single-chain form with another amino acid. The Examiner acknowledges that 
O'Brien does not teach a protease domain of an MTSP polypeptide where a free Cys residue 
in the protease domain is replaced with Ser residues. See, for example, the non-final Office 
Action, mailed June 25, 2007 (Exhibit 1), at page 25, which recites: 

The reference O'Brien et al, does not teach a serine protease domain of a MTPSP [sic] 
polypeptides wherein free Cys residues have been replaced with Ser residues. 

Even post-KSR, "it remains necessary to identify some reason that would have led a chemist 
to modify a known compound in a particular manner to establish prima facie obviousness of a 
new claimed compound." Takeda Chem. Indus., Ltd. v. Alphapharm Pty., Ltd. (Fed. Cir. 
2007). 

In this instance, there is no teaching or suggestion in O'Brien for isolating a single 
chain polypeptide consisting only of an MTSP protease domain in which a free Cys is 
replaced with another amino acid. O'Brien provides no teaching or suggestion for isolating 
the protease domain and preparing it as a single chain. O'Brien does not teach or suggest 
replacing any amino acid in the MTSP polypeptide with another amino acid, and provides no 
teaching or suggestion for modifying a single-chain polypeptide having a free Cys residue by 
replacing the free Cys residue with another amino acid. 

For at least the reasons discussed above, O'Brien, alone or in combination with what 
was known in the art, does not teach or suggest every element of independent claim 1 , 
Accordingly, Appellant respectfully submits that claim 1 is not taught or suggested by 
O'Brien, Thus, the Examiner has failed to set forth a prima facie case of obviousness of 
claim 1 . Appellant respectfully submits that the rejection of claim 1 as obvious over O'Brien 
is erroneous in law and fact and, therefore, should be reversed. 



-79- 



Applicant : Madison et al^^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^lt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

For the reasons above, O'Brien fails to set forth a prima facie case of obvious of any of 
the dependent claims and further, additional reasons why O'Brien fails to set forth a prima 
facie case of obvious of each dependent claim are described below. 

Dependent Claim 11 

Claim 1 1 depends from claim 1 and specifies that the MTSP is selected from among 
MTSPl, MTSP3, MTSP4 and MTSP6. Claim 1 1 includes every limitation of claim 1, from 
which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien, 
alone or in combination with what was known in the art, does not teach or suggest every 
element of claim 1 1 . Accordingly, Appellant respectfully submits that claim 11 is not taught 
or suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 1 1 . Appellant respectfiilly submits that the rejection of claim 1 1 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 12 

Claim 12 depends from claim 1 and specifies that the MTSP protease domain consists 
of a sequence of amino acid residues selected from among amino acids 615-855 of SEQ ID No. 
2 (MTSPl), amino acids 205-437 of SEQ ID NO. 4 (MTSP3), the amino acid residues set forth 
as SEQ ID No. 6 (MTSP4) or as amino acids 217-443 in SEQ ID No. 12 (MTSP6), where the 
free Cys is replaced with another amino acid. Claim 12 includes every limitation of claim 1, 
from which it depends. Thus, for the reasons discussed above with respect to claim 1, O'Brien, 
alone or in combination with what was known in the art, does not teach or suggest every 
element of claim 12. Accordingly, Appellant respectfially submits that claim 12 is not taught or 
suggested by O'Brien.. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 12. Appellant respectfiilly submits that the rejection of claim 12 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 13 

Claim 1 3 depends from claim 1 and specifies that the substantially purified polypeptide 
has at least about 95% sequence identity with a protease domain consisting of a sequence of 
amino acid residues selected from among amino acids 615-855 of SEQ ID No. 2, amino acids 
205-437 of SEQ ID NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217- 
443 in SEQ ID No. 12. Claim 13 includes every limitation of claim 1, from which it depends. 
Thus, for the reasons discussed above with respect to claim 1, O'Brien, alone or in combination 
with what was known in the art, does not teach or suggest every element of claim 13. Hence, 



-80- 



Applicant : Madison 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^EtNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

Appellant respectfully submits that claim is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 13. Appellant 
respectfully submits that the rejection of claim 13 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 
Dependent Claim 34 

Claim 34 depends from claim 1 and specifies that the MTSP is selected from among 
corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), TMPRSS2, and 
TMPRSS4. Claim 34 includes every limitation of claim 1 , from which it depends. Thus, for the 
reasons discussed above with respect to claim 1, O'Brien, alone or in combination with what was 
known in the art, does not teach or suggest every element of claim 34. Accordingly, Appellant 
respectfully submits that claim 34 is not taught or suggested by O'Brien. Thus, the Examiner 
has failed to set forth a prima facie case of obviousness of claim 34. Appellant respectfully 
submits that the rejection of claim 34 as obvious over O'Brien is erroneous in law and fact and, 
therefore, should be reversed. 

Dependent Claim 35 

Claim 35 is directed to a conjugate that comprises a) a polypeptide of claim 1 and b) 
a targeting agent linked to the protein directly or via a linker, wherein the conjugate has 
serine protease activity. The specification defines a targeting agent as 

any moiety, such as a protein or effective portion thereof, that provides specific binding 
of the conjugate to a cell surface receptor, which, preferably, internalizes the conjugate or 
MTSP portion thereof. A targeting agent may also be one that promotes or facilitates, for 
example, affinity isolation or purification of the conjugate; attachment of the conjugate to 
a surface; or detection of the conjugate or complexes containing the conjugate. 

(e.g., see page 38, lines 9-15), 

Claim 35 recites that a targeting agent is linked to the protein of claim 1 directly or 

via a linker and that the conjugate has serine protease activity. There is no teaching or 

suggestion in O'Brien of conjugating a targeting agent to an isolated single-chain polypeptide 

consisting only of an MTSP protease domain in which a free Cys was replaced with another 

amino acid. 

O'Brien teaches, at col. 9, lines 53-56, covalently linking another polypeptide to an 
intact TADG-15 polypeptide or to a fragment thereof. The cited section states: 

The fragment, or the intact TAGD-15 polypeptide, may be covalently linked to another 
polypeptide, e.g., which acts as a label, a ligand, or a means to increase 
antigenicity, [emphasis added] 



-81- 



Applicant : Madison et a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 




D^lt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 

By "fragment" O'Brien mean "antigenic fragment" or other fragment (see, col. 9, lines, 22- 
32), which describe fragments as 10 residues, typically 20 residues and "preferably at least 30 
(e.g 50) residues" in length, and indicates that they can be antigenic fragments for preparing 
antibodies. From the context, O'Brien contemplates antigenic fragments. There is no 
mention, teaching suggestion or hint that the fragment is a catalytic domain or fragment 
thereof. . 

O'Brien does not teach or suggest isolating the protease domain of TADG-15 and 
conjugating it to another polypeptide. The Examiner alleges that the motivation for making 
conjugates is to use it as a diagnostic, which has the potential for a target for therapeutic 
intervention (page 23 of the Office Action). Even if there were such suggestion in O'Brien, 
as noted above, there is no teaching or suggestion for isolating the protease domain or a 
catalytically active portion thereof and replacing a free Cys residue. Hence there can be no 
motivation to prepare conjugates. Furthermore, as discussed above, O'Brien suggests 
isolating antigenic fragments, and linking them to another polypeptide, such as a label, ligand 
or as means to increase antigenicity. O'Brien contemplates using antigenic fragments to 
make antibodies because the TAGD-15 polypeptide is considered a possible therapeutic 
target, not as a therapeutic agent or as a diagnotic agent. 

O'Brien teaches that TADG-15 is a highly over-expressed gene in tumors and 
suggests that TADG-15 thus could be a potential target for therapeutic intervention (col. 15, 
lines 31-38). One of ordinary skill in the art would not be lead to conjugate a targeting 
moiety to a target . O'Brien does not teach, suggest or mention conjugating a targeting agent 
to an isolated protease domain. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, Appellant respectfully submits that claim 35 is not taught or 
suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 35. Appellant respectfully submits that the rejection of claim 35 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 36 

Claim 36 depends from claim 35 and recites that the targeting agent permits i) 
affinity isolation or purification of the conjugate; ii) attachment of the conjugate to a surface; 
iii) detection of the conjugate; or iv) targeted delivery to a selected tissue or cell. As 
discussed above, O'Brien does not teach or suggest isolating the protease domain of TADG- 
15, replacing a free Cys with another amino acid and conjugating the single chain protease 



-82- 




Applicant : Madison ^/ a/.^^V Attorney's iNo.: 119385-00028/ 1607 

Serial No. : 09/776,191 APPELLANT'S APPEAL BRIEF 

Filed : February 2, 2001 
Customer Number: 77202 

domain to a targeting agent. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1, Appellant respectfully submits that claim 36 not taught or suggested 
by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of obviousness of 
claim 36. Appellant respectfully submits that the rejection of claim 36 as obvious over 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 40 

Claim 40 is directed to a solid support comprising two or more polypeptides of claim 
1 linked thereto either directly or via a linker. O'Brien does not mention a solid support. 
There is no teaching or suggestion in O'Brien of a solid support that includes two or more 
isolated single-chained polypeptides consisting only of an MTSP protease domain in which a 
free Cys was replaced with another amino acid. In maintaining the rejection, the Examiner 
states that "assays using polypeptides linked to the molecules taught by O'Brien et al. utilize 
solid supports" (page 23 of the Office Action). In the assays described in O'Brien, a 
hybridization probe to the nucleotide encoding TAGD-1 5 polypeptide (such as in a standard 
Northem blot assay) or an antibody to the TAGD-1 5 polypeptide (such as in a standard 
immunoassay) is attached to a solid support. Appellant respectfully submits that, although 
such assays can use solid supports, O'Brien does not teach or suggest an isolated single- 
chained polypeptide consisting only of an MTSP protease domain in which a free Cys was 
replaced with another amino acid nor conjugating two or more such isolated protease 
domains to a solid support. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1 , Appellant respectfially submits that claim 40 is not taught or 
suggested by O'Brien. Thus, the Examiner has failed to set forth a prima facie case of 
obviousness of claim 40. Appellant respectfiilly submits that the rejection of claim 40 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

Dependent Claim 41 

Claim 41 recites a solid support comprising two or more polypeptides of claim 1 linked 
thereto either directly or via a linker where the polypeptides comprise an array. The 
specification defines an array as a collection of elements containing three or more members. 
As discussed above, O'Brien does not mention a solid support. O'Brien provides no teaching 
or suggestion for isolating the protease domain and preparing it as a single chain. There is no 
teaching or suggestion in O'Brien of a solid support that includes three or more isolated single- 
chained polypeptides consisting only of an MTSP protease domain in which a free Cys was 



-83- 



Applicant : Madison et al: 
Serial No. : 09/776,191 




Attorney's 



D^lt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Filed : February 2, 2001 
Customer Number: 77202 

replaced with another amino acid. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, claim 41 is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 41 . Appellant 
respectfully submits that the rejection of claim 41 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 
Dependent Claim 42 

Claim 42 is directed to the solid support of claim 41, wherein the array comprises 
polypeptides having different MTSP protease domains. There is no teaching or suggestion in 
O'Brien of a solid support that includes three or more isolated single-chained polypeptides 
consisting only of an MTSP protease domain in which a free Cys was replaced with another 
amino acid. Further, the only MTSP taught in O'Brien is TAGD-15. There is no teaching or 
suggestion of any other MTSP. Hence, there can be no teaching or suggestion in O'Brien to 
conjugate isolated protease domains from different MTSPs to a solid support to form an 
array. Accordingly, for these reasons and the reasons discussed above with respect to claim 
1, Appellant respectfiilly submits that claim 42 is not taught or suggested by O'Brien. Thus, 
the Examiner has failed to set forth a prima facie case of obviousness of claim 42. Appellant 
respectfully submits that the rejection of claim 42 as obvious over O'Brien is erroneous in 
law and fact and, therefore, should be reversed. 

Dependent Claim 113 

Claim 1 1 3 is directed to a solid support comprising two or more polypeptides of claim 
12 linked thereto either directly or via a linker. Claim 12 depends from claim 1 and recites 
that the MTSP protease domain consists of a sequence of amino acid residues selected from 
among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the 
amino acid residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 
Claim 12 includes every limitation of claim 1, from which it depends. 

O'Brien does not mention a solid support. Furthermore, there is no teaching or 
suggestion in O'Brien of a solid support that includes two or more isolated single-chain 
polypeptides consisting only of an MTSP protease domain in which a free Cys was replaced 
with another amino acid. Accordingly, for these reasons and the reasons discussed above 
with respect to claim 1 , Appellant respectfully submits that claim 113 is not taught or 
suggested by O'Brien the Examiner has failed to set forth a prima facie case of obviousness 



-84- 



Applicant : Madison et a/.^ 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



d^it No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

of claim 113. Appellant respectfully submits that the rejection of claim 1 1 3 as obvious over 
O'Brien is erroneous in law and fact and, therefore, should be reversed. 
Dependent Claim 114 

Claim 114 depends from claim 113 £ind is directed to an array. The specification 
defines an array as a collection of elements containing three or more members. O'Brien 
provides no teaching or suggestion of an array that includes three or more isolated single- 
chained polypeptides consisting only of an MTSP protease domain in which a free Cys was 
replaced with another amino acid. Accordingly, for these reasons and the reasons discussed 
above with respect to claim 1, claim 1 14 is not taught or suggested by O'Brien. Thus, the 
Examiner has failed to set forth a prima facie case of obviousness of claim 114. Appellant 
respectfully submits that the rejection of claim 1 14 as obvious over O'Brien is erroneous in law 
and fact and, therefore, should be reversed. 



Appell£mt respectftiUy submits that claim 1 as well as each of claims 11-13, 34-36, 
40-42, 113 and 114, which ultimately depend from claim 1 and include every limitation 
thereof, are nonobvious and distinguishable from the teachings of O'Brien. Thus, Appellant 
respectfully submits that the Examiner has failed to establish claims 1, 11-13, 34-36, 40-42, 
113 and 1 14 as obvious under 35 U.S.C. §103(a) over O'Brien. Accordingly, Appellant 
respectfiiUy submits that the rejection of claims 1, 11-13, 34-36, 40-42, 1 13 and 1 14 as 
obvious over O'Brien is erroneous in law and fact and, therefore, should be reversed. 

VIII. CONCLUSIONS 

Appellant respectfully submits that the rejection of claims 1,11, 20, 34-36, 40-42, 113 
and 114 under 35 U.S.C. §112, first paragraph, as allegedly containing subject matter that 
was not described in the specification in such a way as to reasonably convey to one skilled in 
the art that the inventor, at the time the application was filed, had possession of the claimed 
subject matter, is erroneous in law and fact and, therefore, should be reversed. 

Appellant also respectfully submits that the rejection of claims 1,11, 20, 34-36, 40- 
42, 113 and 1 14 under 35 U.S.C. § 1 12, first paragraph, because the specification allegedly 
fails to describe the claimed subject matter in such a way as to enable one skilled in the art to 
make and use the claimed subject matter commensurate in scope with these claims, is 
erroneous in law and fact and, therefore, should be reversed. 



Summary 



-85- 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




Attorney's Q^pt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



Appellant also respectfully submits that the Examiner has failed to establish claims 1, 
11-13, 20, 34-36, 40-42, 1 13 and 1 14 as anticipated by Takeuchi under 35 U.S.C. §102(b). 
Accordingly, Appellant respectfully submits that the rejection of claims 1-3, 19 and 20 as 
anticipated by Takeuchi is erroneous in law and fact and, therefore, should be reversed. 

Appellant also respectfully submits that the Examiner has failed to establish claims 1, 
11-13 and 34 as anticipated by O'Brien under 35 U.S.C. §102(e). Accordingly, Appellant 
respectfully submits that the rejection of claims 1,11-13 and 34 as anticipated by O'Brien is 
erroneous in law and fact and, therefore, should be reversed. 

Appellant further respectfully submits that the Examiner has failed to establish claims 1 , 
11-13, 34-36, 40-42, 1 13 and 1 14 as obvious under 35 U.S.C. §103(a) over O'Brien. 
Accordingly, Appellant respectfully submits that the rejection of claims 1, 11-13, 34-36, 40-42, 
113 and 1 14 as obvious over O'Brien is erroneous in law and fact and, therefore, should be 
reversed. 



The Director is authorized to charge any fees that may be required, or to credit any 
overpayment to Deposit Account No. 02-1 81 8. Please indicate the Attorney Docket No. 
1 19385-00028/1607 on the account statement. If a Petition for Extension of Time is needed. 



this paper is to be considered such Petition. 



Respectfully submitted, 



Dated: March 16, 2009 



BY: 




Reg. No. 33,779 



Address all correspondence to: 
77202 

Stephanie Seidman 
K&L Gates LLP 

3580 Carmel Mountain Road, Suite 200 
San Diego, CaUfomia 92130 
Telephone: (858) 509-7410 
Facsimile: (858) 509-7460 
email: stephanie.seidman@klgates.com 



-86- 



Applicant 
Serial No. 
Filed 



Madison et al: 
09/776,191 
February 2, 2001 




Attorney's O^pt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



CLAIMS APPENDIX 



PENDING CLAIMS ON APPEAL OF 
U.S. PATENT APPLICATION SERIAL NO. 09/776,191 

1. (Rejected) An isolated, substantially purified single-chain poly-peptide, 
consisting only of a protease domain of a type-II membrane-type serine protease (MTSP) or a 
catalytically active fragment thereof as a single chain, wherein: 

a fi"ee Cys in the protease domain is replaced with another amino acid; and 
the MTSP protease domain or catalytically active fi-agment thereof has serine protease 
activity as a single chain. 

2. - 9, (Cancelled). 

10. (Withdrawn) The substantially purified polypeptide of claim 1, wherein the 
MTSP portion has an N-terminus that comprises IVNG, ILGG, VGLL or ILGG. 

1 1 . (Rejected) The substantially purified polypeptide of claim 1 , wherein the MTSP 
is selected fi-om among MTSPl, MTSP3, MTSP4 and MTSP6. 

12. (Rejected) The substantially purified polypeptide of claim 1, wherein the MTSP 
protease domain consists of a sequence of amino acid residues selected from among amino 
acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID NO. 4, the amino acid 
residues set forth as SEQ ID No. 6 or as amino acids 217-443 in SEQ ID No. 12. 

13. (Rejected) The substantially purified polypeptide of claim 1 that has at least about 
95% sequence identity with a protease domain consisting of a sequence of amino acid residues 
selected fi-om among amino acids 615-855 of SEQ ID No. 2, amino acids 205-437 of SEQ ID 
NO. 4, the amino acids set forth as SEQ ID No. 6, and amino acids 217-443 in SEQ ID No. 12. 

Claims 14-19 (Cancelled). 

20. (Rejected) The polypeptide of claim 1, wherein a fi-ee Cys in the protease 
domain is replaced with a serine. 

Claims 21- 33 (Cancelled). 

34. (Rejected) The polypeptide of claim 1, wherein the MTSP is selected fi-om 
among corin, MTSPl, enteropeptidase, human airway trypsin-like protease (HAT), 
TMPRSS2, and TMPRSS4. 

35. (Rejected) A conjugate, comprising: 

a) a polypeptide of claim 1 , and 

b) a targeting agent linked to the protein directly or via a linker, wherein the 
conjugate has serine protease activity. 



-87- 



Applicant : Madison et a/M 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^Pt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

36. (Rejected) The conjugate of claim 35, wherein the targeting agent permits 

i) affinity isolation or purification of the conjugate; 

ii) attachment of the conjugate to a surface; 

iii) detection of the conjugate; or 

iv) targeted delivery to a selected tissue or cell. 
Claims 37 -39 (Cancelled) 

40. (Rejected) A solid support comprising two or more polypeptides of claim 1 
linked thereto either directly or via a linker. 

41 . (Rejected) The support of claim 40, wherein the polypeptides comprise an 

array. 

42. (Rejected) The support of claim 41, wherein the array comprises polypeptides 
having different MTSP protease domains. 

43. (Withdrawn) A method for identifying candidate anti-tumor compounds that 
inhibit the protease activity of an MTSP, comprising: 

contacting a polypeptide of claim 1 with a substrate proteolytically cleaved by the 
MTSP, and, either simultaneously, before or after, adding a test compound or plurality thereof; 
measuring the amount of substrate cleaved in the presence of the test compound; and 
selecting compounds that change the amount cleaved compared to a control, whereby 
compounds that modulate the activity of the MTSP are identified. 

44. (Withdrawn) The method of claim 43, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, £intibodies or fragments thereof. 

45. (Withdrawn) The method of claim 43, wherein a plurality of the test 
compounds are screened simultaneously. 

46. (Withdrawn) The method of claim 43, wherein the change in the amount 
cleaved is assessed by comparing the amount cleaved in the presence of the test compound 
with the amount in the absence of the test compound. 

47. (Cancelled) 

48. (Withdrawn) The method of claim 43, wherein a plurality of the polypeptides 
are linked to a solid support, either directly or via a linker. 

49. (Withdrawn) The method of claim 43, wherein the polypeptides comprise an 



array. 



-88- 



Applicant : Madison et a/.W 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



D^^t No.: 1 19385-00028 / 1607 

APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

50. (Withdrawn) The method of claim 43, wherein the polypeptides comprise a 
pluraHty of different MTSP proteases. 

5 1 . (Withdrawn) A method of identifying a compound that specifically binds to a 
single chain protease domain of an MTSP, comprising: 

contacting a polypeptide of claim 1 with a test compound or plurality thereof under 
conditions conducive to binding thereof; and 

identifying compounds that specifically bind to the MTSP single chain protease domain or 
compounds that inhibit binding of a compound known to bind to the MTSP single chain 
protease domain, wherein the known compound is contacted with the polypeptide before, 
simultaneously with or after the test compound. 

52. (Withdrawn) The method of claims 51, wherein the polypeptide is linked either 
directly or indirectly via a linker to a solid support. 

53. (Withdrawn) The method of claim 51, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fi-agments thereof 

54. (Withdrawn) The method of claim 51, wherein a plurality of the test substances 
are screened for simultaneously. 

55. (Withdrawn) The method of claim 52, wherein a plurality of the polypeptides 
are linked to a solid support. 

56. -107. (Cancelled). 

108. (Withdrawn) A conjugate, comprising: 

a) an MTSP3 or an MTSP4 or the MTSP6 of claim 12; and 

b) a targeting agent linked to the protein directly or via a linker. 

109. (Withdrawn) The conjugate of claim 108, wherein the targeting agent permits 

i) affinity isolation or purification of the conjugate; 

ii) attachment of the conjugate to a surface; 

iii) detection of the conjugate; or 

iv) targeted delivery to a selected tissue or cell. 
Claims 110-112 (Cancelled). 

113. (Rejected) A solid support comprising two or more polypeptides of claim 12 
linked thereto either directly or via a linker 

114. (Rejected) The support of claim 113, wherein the polypeptides comprise an 



array. 



-89- 



Applicant : Madison et a/H 
Serial No. : 09/776,191 
Filed : February 2, 2001 




Attorney's 



l^^tNo.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 




Customer Number: 77202 

115. (Withdrawn) A method for identifying compounds that modulate the protease 
activity of an MTSP of claim 1 , comprising: 

contacting the MTSP of claim 1 with a substrate proteolytically cleaved by the MTSP, 
and, either simultaneously, before or after, adding a test compound or plurality thereof; 
measuring the amount of substrate cleaved in the presence of the test compound; and 
selecting compounds that change the amount cleaved compared to a control, whereby 
compounds that modulate the activity of the MTSP are identified. 

116. (Withdrawn) The method of claim 1 1 5, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fragments thereof. 

117. (Cancelled). 

118. (Withdrawn) The method of claim 115, wherein the change in the amount 
cleaved is assessed by comparing the amount cleaved in the presence of the test compound 
with the amount in the absence of the test compound. 

119. (Withdrawn) The method of claim 115, wherein a plurality of the test substances 
are screened for simultaneously. 

120. (Withdrawn) The method of claim 119, wherein a plurality of the polypeptides 
are linked to a solid support. 

121. (Cancelled). 

122. (Withdrawn) A method of identifying a compound that specifically binds to an 
MTSP protease domain, comprising: 

contacting an MTSP protease domain of claim 12 with a test compound or plurality thereof 
under conditions conducive to binding thereof; and 

identifying compounds that specifically bind to the MTSP. 

123. (Withdrawn) The method of claim 122, wherein the polypeptide is linked either 
directly or indirectly via a linker to a solid support. 

124. (Withdrawn) The method of claim 122, wherein the test compounds are small 
molecules, peptides, peptidomimetics, natural products, antibodies or fragments thereof. 

125. (Withdrawn) The method of claim 122, wherein a plurality of the test substances 
are screened for simultaneously, 

126. (Withdrawn) The method of claim 125, wherein a plurality of the polypeptides 



are linked to a solid support. 

127.- 137. (Cancelled). 



-90- 



Applicant 
Serial No. 
Filed 



Madison et al 
09/776,191 
February 2, 2001 




Customer Number: 77202 



m 



Attorney's No.: 1 19385-00028 / 1607 

APPELLANT'S APPEAL BRIEF 



EVIDENCE APPENDIX 



EXHIBIT 1 
EXHIBIT 2 
EXHIBIT 3 
EXHIBIT 4 
EXHIBIT 5 
EXHIBIT 6 



EXHIBIT 7 
EXHIBIT 8 
EXHIBIT 9 



EXHIBIT 10: 
EXHIBIT 1 1 : 
EXHIBIT 12: 
EXHIBIT 13: 
EXHIBIT 14: 
EXHIBIT 15: 
EXHIBIT 16: 
EXHIBIT 17: 
EXHIBIT 18: 
EXHIBIT 19: 
EXHIBIT 20: 
EXHIBIT 21: 
EXHIBIT 22: 
EXHIBIT 23: 



EXHIBIT 24: 
EXHIBIT 25: 
EXHIBIT 26: 
EXHIBIT 27: 
EXHIBIT 28: 
EXHIBIT 29: 
EXHIBIT 30: 



Final Office Action, dated March 26, 2008. 
Non-final Office Action, dated June 25, 2007. 

Takeuchi et al., Proc. Natl. Acad. Sci. USA 96: 1 1054-1 1061 (1999). 

O'Brien et al, U.S. Patent No. 5,972,616. 

Bachovchin et al., Proc. Natl Acad. Sci. 78: 7323-7326 (1981). 

Brinkley, "A Brief Survey of Methods for Preparing Protein Conjugates with 
Dyes, Haptens, and Cross-linking Reagents" in Perspectives in Bioconjugate 
Chemistry (Claude Meares, ed. 1993, Chapter 4, pages 59-70). 

Bryan, Biochem. Biophys. Acta 1543: 200-203 (2000). 

Carter et al. Nature 332: 564-568 (1988). 

Cheah et al., J. Biol. Chem. 265: 7180-7187 (1990). 

Craik et al.. Science 237:909-913 (1987). 

Dawson et al, U.S. Pat. No. 5,645,833 (1997). 

Devereux et al. Nucleic Acids Research 12(I):387-395 (1984). 

Farley et al, Biochem. Biophys. Acta 1 173: 350-352 (1993). 

Hooper et al, Eur. J. Biochem. 267: 6931-6937 (2000). 

Hooper a/., J. Biol. Chem. 276: 857-860 (2001). 

Jacquinet et al, FEBS Lett. 468: 93-100 (2000). 

Kitamoto et al, Proc Natl Acad Sci USA 91 : 7588-7592 (1994). 

Kjtamoto et al, Biochem. 27: 4562-4568 (1995). 

Leytus et al. Biochemistry 27: 1067-1074 (1988). 

Lin et al, J. Biol. Chem. 274: 18231-18236 (1999). 

Lu et al, J. Mol. Biol. 292: 361-373 (1999). 

Matsushima et al, J. Biol. Chem. 269: 19976-19982 (1994). 

Means & Feeney, "Chemical Modifications of Proteins: History and 
Applications" in Perspectives in Bioconjugate Chemistry (Claude Meares, ed., 
1993, Chapter 2, pages 10-20). 

Nienaber et al, J. Biol. Chem. 275: 7239-48 (2000). 

O'Brien et al. International PCT application No. WO 00/52044. 

Parks etal, J. Biol. Chem. 268: 19101-19109 (1993). 

Parks & Lamb, Cell 64: 111-1%! (1991). 

Pearson et al, Proc. Natl. Acad. Sci. USA 85: 2444 (1988). 

Pearson et al, Cabios Invited Review 13(4): 325-332 (1997). 

Perona «& Craik, Protein Science 4: 337-360 (1995). 



-91- 



Applicant 
Serial No. 
Filed 



Madison et al^^ Attorney's No.: 1 19385-00028 / 1607 

09/776,191 APPELLANT'S APPEAL BRIEF 

February 2, 2001 




Customer Number: 77202 

EXHIBIT 31: Paoloni-Giacobino et al.. Genomics 44: 309-320 (1997). 

EXHIBIT 32: Silverman et al., Curr. Opin. Chem. Biol., 2: 397-403 (1998). 

EXHIBIT 33: Sittampalam et al, Curr. Opin. Chem. Biol., 1 : 384-391 (1997). 

EXHIBIT 34: Sommerhoffef al., Proc. Natl. Acad. Sci. USA 96:10984-10991 (1999). 

EXHIBIT 35: Sprang et al.. Science 237: 905-909 (1987). 

EXHIBIT 36: Tomitaefa/., J. Biochem. 124: 784-789 (1998). 

EXHIBIT 37: Tsuji et al., J Biol Chem 266(25): 16948-16953 (1991). 

EXHIBIT 38: Vu et al., J. Biol. Chem. 272: 31315-31320 (1997). 

EXHIBIT 39: Wallrapp et al.. Cancer 60: 2602-2606 (2000). 

EXHIBIT 40: Walter et al., Annu. Rev. Cell Biol. 2: 499-516 (1986). 

EXHIBIT 41: Xu et al, J. Biol. Chem. 275: 378-385 (2000). 

EXHIBIT 42: Yahagi et al, Biochem. Biophys. Res. Commun. 219: 806-812 (1996). 
EXHIBIT 43: Yamaoka a/., J. Biol. Chem. 273: 11895-11901 (1998). 
EXHIBIT 44: Yan et al, J. Biol. Chem. 274: 14926-14935 (1999). 
EXHIBIT 45: Zubay, Biochemistry (1983), pages 12-13. 
EXHIBIT 46: Office Action, mailed April 21, 2006. 



-92- 



J 



Applicant 
Serial No. 
Filed 



Madison et al, 
09/776,191 
February 2, 2001 




Attorney's 



No.: 119385-00028/ 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



RELATED PROCEEDINGS APPENDIX 



None 



-93- 



Applicant 
Serial No. 
Filed 



Madison et at. 
09/776,191 
February 2, 2001 



Attorney's DiSWRt No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



EVIDENCE APPENDIX 



EXHIBIT 1 
EXHIBIT 2 
EXHIBIT 3 
EXHIBIT 4 
EXHIBIT 5 
EXHIBIT 6 



EXHIBIT 7 
EXHIBIT 8 
EXHIBIT 9 



EXHIBIT 10: 
EXHIBIT 1 1 : 
EXHIBIT 12: 
EXHIBIT 1 3 : 
EXHIBIT 14: 
EXHIBIT 15: 
EXHIBIT 16: 
EXHIBIT 17: 
EXHIBIT 18: 
EXHIBIT 19: 
EXHIBIT 20: 
EXHIBIT 21: 
EXHIBIT 22: 
EXHIBIT 23: 



EXHIBIT 24: 
EXHIBIT 25: 
EXHIBIT 26: 
EXHIBIT 27: 
EXHIBIT 28: 
EXHIBIT 29: 
EXHIBIT 30: 



Final Office Action, dated March 26, 2008. 
Non- final Office Action, dated June 25, 2007. 

Takeuchi et al., Proc. Natl. Acad. Sci. USA 96: 1 1054-1 1061 (1999). 

O'Brien et al., U.S. Patent No. 5,972,616. 

Bachovchin et al, Proc. Natl Acad. Sci. 78: 7323-7326 (1981). 

Brinkley, "A Brief Survey of Methods for Preparing Protein Conjugates with 
Dyes, Haptens, and Cross-linking Reagents" in Perspectives in Bioconjugate 
Chemistry (Claude Meares, ed. 1993, Chapter 4, pages 59-70). 

Bryan, Biochem. Biophys. Acta 1543: 200-203 (2000). 

Carter a/., Nature 332: 564-568 (1988). 

Cheah et al., J. Biol. Chem. 265: 7180-7187 (1990). 

Craik etal.. Science 237:909-913 (1987). 

Dawson etal., U.S. Pat. No. 5,645,833 (1997). 

Devereux et al. Nucleic Acids Research 12(I):387-395 (1984). 

Farley et al., Biochem. Biophys. Acta 1 173: 350-352 (1993). 

Hooper et at., Eur. J. Biochem. 267: 6931-6937 (2000). 

Hooper et al, J. Biol. Chem. 276: 857-860 (2001). 

Jacquinet etal, FEBS Lett. 468: 93-100 (2000). 

Kitamoto et al, Proc Natl Acad Sci USA 91: 7588-7592 (1994). 

Kitamoto et al, Biochem. 27: 4562-4568 (1995). 

Leytus et al.. Biochemistry 27: 1067-1074 (1988). 

Lin et al, J. Biol. Chem. 274: 18231-18236 (1999). 

Uxet al., J. Mol. Biol. 292: 361-373 (1999). 

Matsushima a/., J. Biol. Chem. 269: 19976-19982 (1994). 

Means & Feeney, "Chemical Modifications of Proteins: History and 
Applications" in Perspectives in Bioconjugate Chemistry (Claude Meares, ed., 
1993, Chapter 2, pages 10-20). 

Nienaber et al., J. Biol. Chem. 275: 7239-48 (2000). 

O'Brien et al., International PCT appUcation No. WO 00/52044. 

Parks et al, J. Biol. Chem. 268: 19101-19109 (1993). 

Parks & Lamb, Cell 64: lll-lZl (1991). 

Pearson etal, Proc. Natl. Acad. Sci. USA 85: 2444 (1988). 

Pearson etal., Cabios Invited Review 13(4): 325-332 (1997). 

Perona & Craik, Protein Science 4: 337-360 (1995). 



Applicant 
Serial No. 
Filed 



Madison et al. 
09/776,191 
February 2, 2001 




Attorney's DSilfet No.: 1 19385-00028 / 1607 
APPELLANT'S APPEAL BRIEF 



Customer Number: 77202 



EXHIBIT 31 : Paoloni-Giacobino et al.. Genomics 44: 309-320 (1997). 

EXHIBIT 32: Silverman et al, Curr. Opin. Chem. Biol., 2: 397-403 (1998). 

EXHIBIT 33: Sittampalam et al, Curr. Opin. Chem. Biol., 1: 384-391 (1997). 

EXHIBIT 34: Sommerhoffe^ al., Proc. Natl. Acad. Sci. USA 96:10984-10991 (1999). 

EXHIBIT 35: Sprang et al. Science 237: 905-909 (1987). 

EXHIBIT 36: Tomita et al, J. Biochem. 124: 784-789 (1998). 

EXHIBIT 37: Tsuji etal, J Biol Chem 266(25): 16948-16953 (1991). 

EXHIBIT 38: Vuetal, J. Biol. Chem. 272: 31315-31320(1997). 

EXHIBIT 39: Wallrapp et al. Cancer 60: 2602-2606 (2000). 

EXHIBIT 40: Walter et al, Annu. Rev. Cell Biol. 2: 499-516 (1986). 

EXHIBIT 41: Xu et al, J. Biol. Chem. 275: 378-385 (2000). 

EXHIBIT 42: Yahagi etal, Biochem. Biophys. Res. Commun. 219: 806-812 (1996). 
EXHIBIT 43: Yamaoka a/., J. Biol. Chem. 273: 11895-11901 (1998). 
EXHIBIT 44: Yan etal, J. Biol. Chem. 274: 14926-14935 (1999). 
EXHIBIT 45: Zubay, Biochemistry (1983), pages 12-13. 
EXHIBIT 46: Office Action, mailed April 21, 2006. 





Exhibit 1 




United States Patent and Trademark Office 



or' 



UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 
Address: COMMISSIONER FOR PATENTS 
P.O.Box 1450 

Alexandria. VLrginia 22313-1450 
www.uspio.gov 



APPLICATION NO. 



FILING DATE 



PTRST NAMED INVENTOR 



ATTORNEY DOCKET NO. 



CONFIRMATION NO. 



09/776,191 



02/02/2001 



Edwin L. Madison 



119385-00028 / 1607 



3237 



20985 7590 03/26/2008 

FISH & RICHARDSON, PC 
P.O. BOX 1022 

MINNEAPOLIS, MN 55440-1022 



EXAMINER 



PAK, YONG D 



ART UNIT 



PAPER NUMBER 



1652 



MAIL DATE 



DELIVERY MODE 



03/26/2008 



PAPER 



Please find below and/or attached an Office communication concerning this application or proceeding. 



The time period for reply, if any, is set in the attached communication. 



PTOL-90A (Rev. 04/07) 





Application NO. 


Applicant(s) 


Office Action Summary 


09/776,191 


MADISON ET AL 


Examiner 


Art Unit 






Yong D. Pak 


1652 





- The MAILING DATE of this communication appears on the cover sheet with the correspondence address - 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) OR THIRTY (30) DAYS. 
WHICHEVER IS LONGER, FROM THE MAILING DATE OF THIS COMMUNICATION. 



- Extensions of time may be available under the provisions of 37 CFR 1 .136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the nnailing date of this communlcatjon. 

- If NO period for reply is specified atx>ve, the maximum statutory period will apply and will expire SIX (6) MONTHS from the nnailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after the mailing date of this communication, even if timely Filed, nnay reduce any 
eamed patent term adjustment. See 37 CFR 1 .704(b). 

Status 

1 )^ Responsive to communication(s) filed on 26 December 2007 . 
2a)S This action is FINAL. 2b)n This action is non-final. 

3) n Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayfe, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) ^ Claim(s) See Continuation Sheet is/are pending in the application. 

4a) Of the above claim(s) 1 0. 43-46, 48-55. 108.109.115.116,118-1 20 and 1 22- 1 26 is/are withdrawn from 
consideration. 

5) 0 Claim(s) is/are allowed. 

6) ^ Claim(s) 1, 1 1-13,20.34-36.40-42. 1 13 and 1 14 is/are rejected. 

7) |~l Claim(s) is/are objected to. 

8) n Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) n The specification is objected to by the Examiner. 

10) 0 The drawlng(s) filed on is/are: a)n accepted or b)n objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held In abeyance. See 37 CFR 1 .85(a). 
Replacement drawing sheet(s) including the correction is required If the drawing(s) is objected to. See 37 CFR 1.121(d). 

1 1) n The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 119 

12) n Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 
a)n All b)n Some * c)^ None of: 

1.|~l Certified copies of the priority documents have been received. 

20 Certified copies of the priority documents have been received in Application No. . 

30 Copies of the certified copies of the priority documents have been received in this National Stage 
application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 



Attachment(s) 

1 ) □ Notice of References Cited (PTO-892) 

2) n Notice of Draftsperson's Patent Drawing Review (PTO-948) 

3) ^ Information Disclosure Statement(s) (PTO/SB/08) 

Paper No(s)/Mail Date 12/26/07 , 



4) □ Interview Summary (PTO-413) 
Paper No(s)/Mail Date. . 



5) 
6) 



Notice of Informal Patent Application 
Other: . 



U.S. Patent and Trademark OfTico 
PTOL-326 (Rev. 08-06) 



Office Action Summary 



Part of Paper No./Mail Date 20080312 



Continuation Sheet (PTOL-326) Application No. 09/776.191 

Continuation of Disposition of Claims: Claims pending in the application are 1,10-13,20,34-36,40-46.48-55,108,109,113- 
116.118-120 and 122-126. 



2 



Application/Control Number: 09/776,191 Page 2 

Art Unit: 1652 

DETAILED ACTION 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797,504. 

The amendment filed on December 26, 2007, amending claim 1 and canceling 
claims 2-3 and 19, has been entered. 

Claims 1, 10-13, 20, 34-36, 40-46, 48-55, 108-109, 113-116, 118-120 and 122- 
126 are pending. Claims 10,43-46,48-55, 108-109, 115-116, 118-120 and 122-126 
are withdrawn. Claims 1, 11-13, 20, 34-36, 40-42 and 113-114 are under consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 1 19(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 

Provisional applications 60/179,982, 60/183,542, 60/213,124, 60/220,970 and 
60/234,840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/1 79,982 and 60/1 83,542 
describe polypeptides related MTSP3 and provisional application 60/213,124, 
60/220,970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986, which is 9/8/2000. 



Information Disclosure Statement 



Application/Control Number: 09/776.191 Page 3 

Art Unit: 1652 

The information disclosure statement (IDS) submitted on December 26, 2007 
was filed after the mailing date of the Non-Final Rejection on June 25, 2007. The 
submission is in compliance with the provisions of 37 CFR 1 .97. Accordingly, the 
information disclosure statement is being considered by the examiner. 

Response to Arguments 

Applicant's amendment and arguments filed on December 26, 2007, have been 
fully considered and are deemed to be persuasive to overcome some of the rejections 
previously applied. Rejections and/or objections not reiterated from previous office 
actions are hereby withdrawn. 

Claim Objections 

Applicants argue that claims 11-13 and 34 should be retained pending a 
determination of the allowability of claim 1 , which is a linking claim, linking the elected 
subject matter. In view of applicant's argument, the objection to claims 11-13 and 34 
have been withdrawn. 

Claim Rejections - 35 (JSC §112-2""^ paragraph 

In view of applicant's argument, the rejection of claims 1, 11-13 and claims 20, 
34-36, 40-42 and 113-114 depending therefrom under 35 U.S.C. 112. second 



Application/Control Number: 09/776,191 Page 4 

Art Unit: 1652 

paragraph, as being indefinite for failing to particularly point out and distinctly claim the 
subject matter which applicant regards as the invention has been withdrawn. 



Claim Rejections - 35 USC §112- 1^^ paragraph 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

Claims 1, 11, 20, 34-36, 40-42 and 113-114 are rejected under 35 U.S.C. 112, 
first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s), at the time the application was filed, had possession of the claimed 
invention. 

Claims 1 , 1 1 , 20, 34-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of type-ll 
membrane-type serine protease (MTSP) from any source. Claims 1 1 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to a genus of polypeptides having any structure. The specification only 
teaches four species, amino acids 615-855 of SEQ ID NO:2 (MTSP1), amino acids of 
205-437 of SEQ ID NO:4 (MTSP3), amino acids of SEQ ID NO:6 (MTSP4) and amino 
acids 217-443 of SEQ ID NO:1 1 (MTSP6). These species are not enough to describe 
the whole genus and there is no evidence on the record of the relationship between the 



Application/Control Number: 09/776,191 Page 5 

Art Unit: 1652 

structure of the above catalytically active protease domains of SEQ ID NOs: 2, 4, 6 and 
1 1 and the structure of the serine protease domain of any or all MTSP polypeptides or 
MTSP1 polypeptides. Further, the specification does not describe the structure of a 
catalytically active fragment of a protease domain of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides consisting of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 
in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1 , 1 1 , 20, 34-36, 40-42 and 
113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 112, first paragraph, published in the Official 
Gazette and also available at www.uspto.qov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are fully described because the specification 
identified 17 members of the MTSP family and identifies the protease domains thereof, 
unknown MTSPs and its protease domains. Examiner respectfully disagrees. The 
claims are not limited to specific protease domains of specific MTSP proteins, but the 
claims are drawn to polypeptides consisting of any protease domains or any or all 



Application/Control Number: 09/776,191 Page 6 

Art Unit: 1652 

catalytically active fragments of said protease domains of any or all MTSP or any or all 
MTSP1, including any or all recombinants, variants and mutants of said MTSP or 
MTSP1 . The recitation of "protease domain of a MTSP" or "MTSP1" fails to provide a 
sufficient description of the claimed genus of polypeptides as it merely describes the 
functional features of the genus without providing any definition of the structural features 
of the species within the genus. The CAFC in UC California v. Eli Lilly, (43 USPQ2d 
1398) stated that: "in claims to genetic material, however a generic statement such as 
'vertebrate insulin cDNA* or 'mammalian insulin cDNA,' without more, is not an 
adequate written description of the genus because it does not distinguish the claimed 
genus from others, except by function. It does not specifically define any of the genes 
that fall within its definition. It does not define any structural features commonly 
possessed by members of the genus that distinguish them from others. One skilled in 
the art therefore cannot, as one can do with a fully described genus, visualize or 
recognize the identity of the members of the genus." Similarly with the claimed genus of 
protease domains, the functional definition of the genus does not provide any structural 
information commonly possessed by members of the genus which distinguish the 
species within the genus from other proteins such that one can visualize or recognize 
the identity of the members of the genus. 

Further, as discussed in the written description guidelines, the written description 
requirement for a claimed genus may be satisfied through sufficient description of a 
representative number of species by actual reduction to practice, reduction to drawings, 
or by disclosure of relevant, identifying characteristics, i.e., structure or other physical 



Application/Control Number: 09/776,191 Page 7 

Art Unit: 1652 

and/or chemical properties, by functional characteristics coupled with a known or 
disclosed correlation between function and structure, or by a combination of such 
identifying characteristics, sufficient to show the applicant was in possession of the 
claimed genus. A representative number of species means that the species which are 
adequately described are representative of the entire genus. Thus, when there is 
substantial variation within the genus, one must describe a sufficient variety of 
species to reflect the variation within the genus. Satisfactory disclosure of a 
representative number depends on whether one of skill in the art would recognize that 
the applicant was in possession of the necessary common attributes or features of the 
elements possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art, adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1 , excepting having serine protease activity. As such, neither the description of 
solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the claims are fully described because members of 
the MTSP family of serine proteases were well known at the time of filing, such as 
conserved characteristic structural elements and protease domains and method of 
identifying serine protease domains were known in the art. Examiner respectfully 



Application/Control Number: 09/776,191 Page 8 

Art Unit: 1652 

disagrees. As discussed above, the claims are not drawn to the specific protease 
domains of specific MTSP type II, but to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSP1 , including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . In view of the widely variant species encompassed by the 
genus, the species disclosed in the specification is not enough and does not constitute 
a representative number of species to describe the whole genus of any or all variants, 
recombinant and mutants of any or all polypeptides having serine protease activity 
isolated from any or all source, including any or all variants, recombinants and mutants 
thereof, and there is no evidence on the record of the relationship between the structure 
of the protease domain of the specific MTSPs disclosed in the specification and the 
structure of any or all recombinant, variant and mutant of any or all polypeptides having 
serine protease activity. Therefore, the specification fails to describe a representative 
species of the genus comprising any or all polypeptides having serine protease activity, 
including any or all variants, recombinants and mutants thereof. 

Applicants also argue that the claims are fully described by the specification 
because one skilled in the art would recognize applicant's possession of the claimed 
subject matter. Examiner respectfully disagrees. As discussed above, the claims are 
not drawn to the specific protease domains of specific MTSP type II, but to polypeptides 
consisting of any protease domains or any or all catalytically active fragments of said 
protease domains of any or all MTSP or any or all MTSP1, including any or all 
recombinants, variants and mutants of said MTSP or MTSP1 . The claimed genera of 



Application/Control Number: 09/776,191 Page 9 

Art Unit: 1652 

the claims are drawn to species which are widely variant in structure. The genus of the 
claims are structurally diverse as it encompasses any catalytically active protease 
domains of any or all MTSP or MTSP1 , excepting having serine protease activity. As 
such, neither the description of solely stnjctural features present in all members of the 
genus is sufficient to be representative of the attributes and features of the entire genus. 
Hence the rejection is maintained. 

Claims 1,11, 20, 34-36, 40-42 and 113-114 are rejected under 35 U.S.C. 112, 
first paragraph, because the specification, while being enabling for a polypeptide 
consisting of amino acids 615-855 of SEQ ID NO:2, does not reasonably provide 
enablement for a polypeptide consisting of any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof- The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731 . 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary, (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples, (4) 
the nature of the invention, (5) the state of the prior art, (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art, and (8) the breadth of the 
claims. 



Application/Control Number: 09/776.191 Page 10 

Art Unit: 1652 

Claims 1, 11, 20, 35-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of a type-ll 
membrane-type serine protease (MTSP) from any source. Claims 1 1 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways in which the proteins* structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
consisting of amino acids 61 5-855 of SEQ ID NO:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance, 



Application/Control Number: 09/776,1 91 Page 1 1 

Art Unit: 1652 

working examples, and unpredictability of the art in predicting function from a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 
within a protein's sequence where amino acid modifications can be made with a 
reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-655 of SEQ ID NO:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 



Application/Control Number: 09/776,1 91 Page 1 2 

Art Unit: 1652 

Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 615-855 of SEQ ID NO:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 
characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731 , 8 
USPQ2nd 1400 (Fed. Cir, 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are enabled because the level of skill in the art 
is high and the specification teaches that MTSP polypeptides constitute a recognized 
well-known and well characterized family of serine protease and the specification 
describes the protease domain of a number of MTSP family members, such as 
conserved features of MTSP protease domains. Examiner respectfully disagrees. The 
scope of the claims, which are drawn to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSP1, including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 , is not commensurate with the enablement provided by the 



Application/Control Number: 09/776,191 Page 13 

Art Unit: 1652 

disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Even 
though the structure of some MTSP are known, the claims are drawn to any or all serine 
domains and catalytically active fragments of any or all protease domains of any or all 
MTSP or MTSP1 . As discussed above, predictability of which changes can be tolerated 
in a protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 
claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains, X-ray crystal structure and etc, such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants also argue that the claims are enabled because the knowledge, 
regarding MTSP proteins, of those skilled in the art is high. The Examiner respectfully 
disagrees. The claims are drawn to polypeptides consisting of any protease domains or 
any or all catalytically active fragments of said protease domains of any or all MTSP or 
any or all MTSP1 , including any or all recombinants, variants and mutants of said MTSP 
or MTSP1 . Since the amino acid sequence of the protein determines its structural and 
functional properties, predictability of which changes can be tolerated in a protein's 
amino acid sequence and obtain the desired activity requires a knowledge of and 
guidance with regard to which amino acids in the protein's sequence, if any, are tolerant 
of modification and which are conserved (i.e. expectedly intolerant to modification), and 



Application/Control Number: 09/776,191 Page 14 

Art Unit: 1652 

detailed knowledge of the ways in which the proteins* structure relates to its function. In 
addition, the art does not provide any teaching or guidance as to which amino acids 
within a serine protease can be modified and which ones are conserved such that one 
of skill in the art can make the recited polypeptides having serine protease activity and 
the general tolerance of serine proteases to structural modifications and the extent of 
such tolerance. The art clearly teaches that changes in a protein's amino acid 
sequence to obtain the desired activity without any guidance/knowledge as to which 
amino acids in a protein are required for that activity is highly unpredictable. At the time 
of the invention, there was a high level of unpredictability associated with altering a 
polypeptide sequence with an expectation that the polypeptide will maintain the desired 
activity. For example, Branden et al. (Introduction to Protein Structure, Garland 
Publishing Inc., New York, page 247, 1991 - cited previously on form PTO-892) teach 
that (1 ) protein engineers are frequently surprised by the range of effects caused by 
single mutations that they hoped would change only one specific and simple property in 
enzymes, (2) the often surprising results obtained by experiments where single 
mutations are made reveal how little is known about the rules of protein stability, and (3) 
the difficulties in designing de novo stable proteins with specific functions. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides consisting of catalytically active fragments 
of only MTSP1 , MTSP3, MTSP4 and MTSP6, but to any or all mutants, variants and 



Application/Control Number: 09/776,1 91 Page 1 5 

Art Unit: 1652 

recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP, consen/ed amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 



Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described In a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

(e) the invention was described in (1 ) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1-3 and 19-20 were rejected under 35 U.S.C. 102(b) as being anticipated 
by Dawson et al. 

In view of the fact that Dawson et al. do not teach an isolated serine protease 
domain of a MTSP protein, the rejection has been withdrawn. 



Application/Control Number: 09/776.191 



Art Unit: 1652 



Page 16 



Claims 1,11-13, 20, 34-36, 40-42 and 113-114 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. 

Claims 1 , 1 1 -1 3, 20 and 34 are drawn to a polypeptide consisting of a serine 
protease domain of MTSP having the characteristics recited in the claims. Claims 35- 
36 are drawn to a conjugate comprising a polypeptide comprising a serine protease 
domain of MTSP and a targeting agent. Claims 40 -42 and 113-114 are drawn to a 
solid support comprising a polypeptide comprising a serine protease domain of MTSP. 

Takeuchi et al. (Reference I J : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the instant invention (page 1 1060, 2""^ full paragraph). 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 
615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 11057). 

Takeuchi et ai. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 1 1055, 3''^ full paragraph, page 1 1057, 4*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 11057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al. anticipates claims 1, 11-13, 20, 34-36, 40-42 and 113-114. 



Application/Control Number: 09/776.191 Page 17 

Art Unit: 1652 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21, 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because the instant claims are drawn to a polypeptide that consists of a protease 
domain or catalytically active portion thereof. Examiner respectfully disagrees. In 
addition to the full-length MT-SP1 , Takeuchi et al. also discloses a polypeptide 
consisting of the serine protease domain. The serine protease domain is initially 
expressed in E. coli as a His-tagged fusion, but a renatured active protein lacking the 
His tag was isolated and N -terminal secuencing of this protein vielded WGGT , which 
corresponds to residues 615-619 of SEQ ID NO:2 of the instant invention. Takeuchi et 
al. discloses that Cys at position 731 forms a disulfide bond with Cys 604 present in the 
pro domain (page 1 1060). Since the serine protease domain of Takeuchi et al. lacks 
the pro domain of the wildtype protein, Cys residue at position 731 of said serine 
protease domain does not form a disulfide bond and therefore is a "free cysteine". The 
specification on page 58 states that in "the single chain form, the residue at 731 in the 
protease domain is free" (page 58, lines 15-16). Therefore, the serine protease domain 
of Takeuchi et al. is a single chain polypeptide. 

Applicants also argue that the claims are not anticipated by Takeuchi et al. 
because Takeuchi et al. does not disclose replacing a free Cys reside of the serine 



Application/Control Number: 09/776.191 
Art Unit: 1652 



Page 18 



protease domain of an MTSP polypeptide with another amino acid or a serine residue. 
Examiner respectfully disagrees. The limitation "a free Cys in the protease domain is 
replaced with another amino acid" and "a free Cys in the protease domain is replaced 
with a serine" is a product-by-process type limitation. The end result of the products of 
the claims is a serine protease domain or a serine protease domain having a serine 
residue. Whether the product of the claimed protein is obtained by replacing a free 
cysteine residue or not, the product is still the same because the instant claims may be 
produced by the recited modification or not. Therefore, there is no there a structure 
implied by said limitations. Since the polypeptide of Takeuchi et al. consists of a 
protease domain of a MTSP and the MTSP protease domain has serine protease 
activity, the claims are anticipated by the prior art. Also, since the serine protease 
domain of Takeuchi et al. has a serine residue, claim 20 is also anticipated. 
Hence the rejections are maintained. 

Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

The following is a quotation of 35 U.S.C. 1 03(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 



Application/Control Number: 09/776,191 
Art Unit: 1652 



Page 19 



(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 

Claims 1, 11-13 and 34 rejected under 35 U.S.C. 103(a) as obvious over O'Brien 

et al. 

Claims 1, 11-13 and 34 are drawn to a polypeptide comprising a serine protease 
domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention (SEQ ID NO:2, columns 19-24). O'Brien et al. teaches a serine protease 
domain having proteolytic activity that is 100% identical to amino acids 615-855 of SEQ 
ID NO:2 (Figure 2, Figure 10 and SEQ ID NO:14). Further, O'Brien et al. teaches a 
method of expressing polypeptides via a vector in host cells. O'Brien et al. also teaches 
that the protease domain could be released and be used as a diagnostic which has the 
potential for a target for therapeutic intervention (Column 15, lines 35-38). Therefore, it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to express the protease domain of SQ ID NO:14 and purify the polypeptide. 
The motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 



Application/Control Number: 09/776,191 Page 20 

Art Unit: 1652 

Therefore, the above reference renders claims 1 , 11-13 and 34 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide. Examiner 
respectfully disagrees. Takeuchi et al. discloses that Cys at position 731 forms a 
disulfide bond with Cys 604 present in the pro domain (page 11060). Since the serine 
protease domain of Takeuchi et al. lacks the pro domain of the wildtype protein, Cys 
residue at position 731 of said serine protease domain does not form a disulfide bond 
and therefore is a "free cysteine". The specification on page 58 states that in "the single 
chain form, the residue at 731 in the protease domain is free" (page 58, lines 15-16). 
Therefore, the serine protease domain of O'Brien et al. is a single chain polypeptide. 

Applicants also argue that the claims are not anticipated by O'Brien et al. 
because O'Brien et al. does not disclose replacing a free Cys reside of the serine 
protease domain of an MTSP polypeptide with another amino acid. Examiner 
respectfully disagrees. The limitation "a free Cys in the protease domain is replaced 
with another amino acid" is a product-by-process type limitation. The end result of the 
products of the claims is a serine protease domain. Whether the product of the claimed 
protein is obtained by replacing a free cysteine residue or not, the product is still the 
same because the instant claims may be produced by the recited modification or not. 
Therefore, there is no there a structure implied by said limitations. Since the 



Application/Control Number: 09/776,1 91 Page 21 

Art Unit: 1652 

polypeptide of O'Brien et al. consists of a protease donnain of a MTSP and the MTSP 
protease domain has serine protease activity, the claims are anticipated by the prior art. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain . Therefore, it would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SQ ID NO:14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. Further, since the serine protease domain of Takeuchi et al. lacks the 
pro domain of the wildtype protein, Cys residue at position 731 of said serine protease 
domain does not form a disulfide bond and therefore is a "free cysteine". The 
specification on page 58 states that in "the single chain form, the residue at 731 in the 
protease domain is free" (page 58, lines 15-16). Also, as discussed previously, the 
limitation "a free Cys in the protease domain is replaced with another amino acid" is a 
product-by-process type limitation. The end result of the products of the claims is a 
serine protease domain. Whether the product of the claimed protein is obtained by 



Application/Control Number: 09/776,1 91 Page 22 

Art Unit: 1652 

replacing a free cysteine residue or not, the product is still the same because the instant 
claims may be produced by the recited modification or not. Therefore, there is no there 
a structure implied by said limitations. Therefore, the serine protease domain of O'Brien 
et al. is a single chain polypeptide. 

Hence the rejections are maintained. 

Claims 35-36, 40-42 and 113-114 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 5,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 15, lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID NO:2 
(Column 9, lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9, lines 54-55) and conjugated to bridging molecules (Column 6, 



Application/Control Number: 09/776,191 Page 23 

Art Unit: 1652 

lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 pnma facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejection and has been discussed above. 

Hence the rejection is maintained. 



Application/Control Number: 09/776.191 Page 24 

Art Unit: 1652 

The rejection of claims 19-20 under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in viewof Takeuchi et al. has been withdrawn. 



Conclusion 

None of the claims are in condition for allowance. 



THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 



Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 571-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Nashaat Nashed can be reached on 571-272-0934. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 
1600. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 



Application/Control Number: 09/776,191 Page 
Art Unit: 1652 

you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll free). 



/Yong D Pak/ 

Primary Examiner, Art Unit 1652 



Exhibit 2 






United States Patent and Trademark Office 



UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark OOlce 
Address: COMMISSIONER FOR PATENTS 
P.O. Box 14S0 

A Icxandria, V irginia 223 1 1 4 SO 
www,uspto.gOv 



APPLICATION NO. 



FILING DATE 



FIRST NAMED INVENTOR 



ATTORNEY DOCKET NO. CONFIRMATION NO. 



09/776,191 



02/02/2001 



Edwin L. Madison 



I7I06-01700I /I607 



3237 



20985 7590 06/25/2007 

FISH & RICHARDSON, PC 
P.O. BOX 1022 

MINNEAPOLIS. MN 55440-1022 



EXAMINER 



PAK, YONG D 



ART UNIT 



PAPER NUMBER 



1652 



MAIL DATE 



DELIVERY MODE 



06/25/2007 



PAPER 



Please find below and/or attached an Office communication concerning this application or proceeding. 



The time period for reply, if any, is set in the attached communication. 



PTOL-90A (Rev. 04/07) 



m 


Application No. 


Applicant(5) 


Office Action Summary 


09/776 191 


MADISON ETAL. 


Examiner 


Art Unit 






Yong D. Pak 


1652 





- The MAILING DATE of this communication appears on the cover sheet with the correspondence address - 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) OR THIRTY (30) DAYS. 
WHICHEVER IS LONGER. FROM THE MAILING DATE OF THIS COMMUNICATION. 

• Extensions of time may be available under the provisions of 37 CFR 1.136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If NO period for repty is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 1 33). 
Any reply received by the Office later than three months after the mailing date of this communication, even if timely Tiled, may reduce any 
earned patent term adjustment See 37 CFR 1.7Q4(b). 

Status 

Responsive to communication(s) filed on 23 March 2007 . 
2a)n This action is FINAL. 2b)S This action is non-final. 

3) n Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) K Claim(s) See Continuation Sheet is/are pending in the application. 

4a) Of the above claim(s) 10.43-46.48-55. 108. 109.115.116,118-120 and 122-126 is/are withdrawn from 
consideration. 

5) n Claim(s) is/are allowed. 

6) [SI Claim(s) 1-3. 1 1-13. 19.20.34-36.40-42. 113 and 114 is/are rejected. 
?)□ Claim(s) is/are objected to. 

8) [I] Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) C The specification is objected to by the Examiner. 

10)n The drawing(s) filed on is/are: a)^ accepted or b)n objected to by the Examiner. 

Applicant may not request that any objection to the drawlng(s) be held in abeyance. See 37 CFR 1.85(a). 

Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 
1 !)□ The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 119 

12)n Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 119(a)-(d) or (f). 
a)nAII b)n Some * c)n None of: 

1 0 Certified copies of the priority documents have been received. 

2.n Certified copies of the priority documents have been received in Application No. . 



3.\3 Copies of the certified copies of the priority documents have been received in this National Stage 
application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 



Attachment(6) 

1 ) H^Notice of References Cited (PTO-892) 

2) n Notice of Oraflsperson's Patent Drawing Review (PTO-948) 

3) n Infonmation Disclosure Slatement(s) (PTO/SB/08) 

Paper No(s)/MaiI Date . 



4) C] Interview Sunnmary (PTO-413) 

Paper No(s)/Mai) Date. . 

5) n Notice of Infonnnal Patent Application 

6) □ Other: . 



U.S. Potont and TrotJonrwjrtc OHIco 

PTOL-326 (Rev, 08-06) 



Office Action Summary 



Part of Paper No./Mail Date 20070104 



Continuation Sheet (PTOL-326) 



Application No. 09/776,191 



Continuation of Disposition of Claims: Claims pending in the application are 1-3,10-13.19,20,34-36,40-46»48-55,108,109,113- 
116.118-120 and 122-126. 



Application/Control Number: Page 2 

09/776.191 

Art Unit: 1652 

DETAILED ACTION 

The petition of March 23, 2007 is being treated as a request for reconsideration. 
In view of said request, the finality of the previous Office action is withdrawn, rendering 
the petition moot. A new action on the merits is set forth below. 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797.504. 

The amendment filed on October 23, 2006, amending claims 1. 12, 13 and 19 
and canceling claim 5, has been entered. 

Claims 1-3, 10-13, 19-20, 34-36, 40-46,48-65, 108-109 113-116. 118-120 and 
122-126 are pending. Claims 10,43-46,48-55, 108-109, 115-116, 118-120 and 122- 
126 are withdrawn. Claims 1-3, 11-13. 19-20. 34-36, 40-42 and 113-114 are under 
consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 119(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 

Provisional applications 60/179,982. 60/183.542. 60/213,124. 60/220.970 and 
60/234.840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/179,982 and 60/183.542 



Application/Control Number: Page 3 

09/776,191 

Art Unit: 1652 

describe polypeptides related MTSP3 and provisional application 60/213.124, 
60/220.970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986. which is 9/8/2000. 

Response to Arguments 

Applicant's amendment and arguments filed on October 23. 2006. have been 
fully considered and are deemed to be persuasive to overcome the rejections previously 
applied. Rejections and/or objections not reiterated from previous office actions are 
hereby withdrawn. 

Claim Objections 

Claims 1 1-13 and 34 are objected for being drawn to non-elected subject matter. 
In response to the previous Office Action, applicants have traversed the above rejection. 
Applicants argue that claims 11-13 and 34 should be retained pending a determination 
of the allowability of claim .1, which is a linking claim, linking the elected subject matter. 
Since claim 1 has not been indicated as allowable, the objection is maintained. 

Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 



Application/Control Number: Page 4 

09/776,191 

Art Unit: 1652 

Claims 1-3, 11-12, 13 and claims 19-20, 34-36, 40-42 and 113-114 depending 
therefrom rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claims 1-3, 11-12. 13 recite the phrase "substantially purified single-chain 
polypeptide". The metes and bounds of the phrase in the context of the above claims 
are not clear to the Examiner. It is not clear to the Examiner what is considered as 
"substantially purified" by the applicants. A perusal of the specification did not provide a 
clear definition for the above phrase. Without a clear definition, those skilled in the art 
would be unable to conclude if a polypeptide is a "substantially purified" polypeptide 
without knowing the metes and bounds of the phrase. Examiner requests clarification of 
the above phrase. 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that when read in light of the specification, the skilled artisan 
would understand the meaning of the recitation "substantially purified" and points to 
page 46, lines 4-15 of the specification for the definition of the phrase "substantially 
purified". Examiner respectfully disagrees. The specification on page 46, lines 4-15, 
does not define what applicants mean by "substantially purified", but only describes that 
"substantially pure means sufficiently homogeneous to appear free of readily detectable 
impurities as determined by standard methods of analysis". Since there is no clear 
guidance to one having ordinary skill in the art in qualifying the purity of an enzyme by 



Application/Control Number: Page 5 

09/776.191 

Art Unit: 1652 

ascertaining whether it is free of readily detectable impurities, it is not clear to the 
Examiner as to how much of a presence of these readily detectable impurities qualifies 
an enzyme to be "substantially pure". Therefore, those skilled in the art would be 
unable to conclude what polypeptides are "substantially purified". 
Hence the rejection is maintained. 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which It Is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the Inventor of carrying out his invention. 

Claims 1-3. 11. 19-20, 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112. first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s). at the time the application was filed, had possession of the claimed 
invention. 

Claims 1-3. 1 1, 19-20. 35-36, 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of type-ll 
membrane-type serine protease (MTSP) from any source. Claims 11 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to a genus of polypeptides having any structure. The specification only 
teaches four species, amino acids 615-655 of SEQ ID NO:2 (MTSP1), amino acids of 
205-437 of SEQ ID NO:4 (MTSP3). amino acids of SEQ ID NO:6 (MTSP4) and amino 
acids 217-443 of SEQ ID NO:1 1 (MTSP6). These species are not enough to describe 



Application/Control Number: Page 6 

09/776,191 

Art Unit: 1652 

the whole genus and there is no evidence on the record of the relationship between the 
structure of the above catalytically active protease domains of SEQ ID NOs: 2. 4, 6 and 
1 1 and the structure of the serine protease domain of any or all MTSP polypeptides or 
MTSP1 polypeptides. Further, the specification does not describe the structure of a 
catalytically active fragment of a protease domain of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides comprising of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 
in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1-3, 11, 19-20, 34-36, 40-42 
and 113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 1 12, first paragraph, published in the Official 
Gazette and also available at www.uspto.gov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are fully described by the specification because 
the structural feature, a single chain protease domain, is present in all members of the 
genus and is the defining and requisite property and the specification clearly describes 



Application/Control Number: Page 7 

09/776.191 

Art Unit: 1652 

this feature. Examiner respectfully disagrees. The recitation of "protease domain of a 
MTSP" or "MTSP1" fails to provide a sufficient description of the claimed genus of 
polynucleotides as it merely describes the functional features of the genus without 
providing any definition of the structural features of the species within the genus. The 
CAFC in UC California v. Eli Lilly. (43 USPQ2d 1398) stated that: "in claims to genetic 
material, however a generic statement such as 'vertebrate insulin cDNA' or 'mammalian 
insulin cDNA/ without more, is not an adequate written description of the genus 
because it does not distinguish the claimed genus from others, except by function. It 
does not specifically define any of the genes that fall within its definition. It does not 
define any structural features commonly possessed by members of the genus that 
distinguish them from others. One skilled in the art therefore cannot, as one can do with 
a fully described genus, visualize or recognize the identity of the members of the 
genus." Similarly with the claimed genus of protease domains, the functional definition 
of the genus does not provide any structural information commonly possessed by 
members of the genus which distinguish the species within the genus from other 
proteins such that one can visualize or recognize the identity of the members of the 
genus. 

Applicants also argue that the claims are fully described because the 
specification describes known MTSPs and identifies the protease domains thereof, 
unknown MTSPs and its protease domains. Examiner respectfully disagrees. The 
claims are not limited to specific protease domains of specific MTSP proteins, but the 
claims are drawn to polypeptides comprising any protease domains or any or all 



Application/Control Number: Page 8 

09/776,191 

Art Unit: 1652 

catalytically active fragments of said protease domains of any or all MTSP or any or all 
MTSP1, including any or all recombinants, variants and mutants of said MTSP or 
MTSP1. As discussed in the written description guidelines, the written description 
requirement for a claimed genus may be satisfied through sufficient description of a 
representative number of species by actual reduction to practice, reduction to drawings, 
or by disclosure of relevant, identifying characteristics, i.e., structure or other physical 
and/or chemical properties, by functional characteristics coupled with a known or 
disclosed correlation between function and structure, or by a combination of such 
identifying characteristics, sufficient to show the applicant was in possession of the 
claimed genus. A representative number of species means that the species which are 
adequately described are representative of the entire genus. Thus, when there is 
substantial variation within the genus, one must describe a sufficient variety of 
species to reflect the variatibn within the genus. Satisfactory disclosure of a 
representative number depends on whether one of skill in the art would recognize that 
the applicant was in possession of the necessary common attributes or features of the 
elements possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art, adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1, excepting having serine protease activity. As such, neither the description of 



Application/Control Number: Page 9 

09/776.191 

Art Unit: 1652 

solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the specification provides "relevant, identifying 
characteristics" of a representative number of species of the claimed genus. Examiner 
respectfully disagrees. The claims are drawn to polypeptides comprising any protease 
domains or any or all catalytically active fragments of said protease domains of any or 
all MTSP or any or all MTSPI, including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . The claims are drawn to polypeptides having any structure 
and therefore, the claims are drawn to a genus encompassing species having 
substantial variation and fails to describe a representative number of species. As 
discussed in the written description guidelines, the written description requirement for a 
claimed genus may be satisfied through sufficient description of a representative 
number of species by actual reduction to practice, reduction to drawings, or by 
disclosure of relevant, identifying characteristics, i.e.. structure or other physical and/or 
chemical properties, by functional characteristics coupled with a known or disclosed 
correlation between function and structure, or by a combination of such identifying 
characteristics, sufficient to show the applicant was in possession of the claimed genus. 
A representative number of species means that the species which are adequately 
described are representative of the entire genus. Thus, when there is substantial 
variation within the genus, one must describe a sufficient variety of species to 
reflect the variation within the genus. Satisfactory disclosure of a representative 
number depends on whether one of skill in the art would recognize that the applicant 



Application/Control Number: Page 10 

09/776,191 

Art Unit: 1652 

was in possession of the necessary common attributes or features of the elements 
possessed by the members of the genus in view of the species disclosed. For 
inventions in an unpredictable art. adequate written description of a genus which 
embraces widely variant species cannot be achieved by disclosing only one species 
within the genus. In the instant case the claimed genera of the claims are drawn to 
species which are widely variant in structure. The genus of the claims are structurally 
diverse as it encompasses any catalytically active protease domains of any or all MTSP 
or MTSP1, excepting having serine protease activity. As such, neither the description of 
solely structural features present in all members of the genus is sufficient to be 
representative of the attributes and features of the entire genus. 

Applicants also argue that the claims are fully described because specification 
provides at least a dozen examples of protease domains of MTSPs. Examiner 
respectfully disagrees. The claims are not drawn to the specific protease domains of 
the MTSPs disclosed in the specification, but to polypeptides consisting of any protease 
domains or any or all catalytically active fragments of said prptease domains of any or 
all MTSP or any or all MTSP1. including any or all recombinants, variants and mutants 
of said MTSP or MTSP1 . In view of the widely variant species encompassed by the 
genus, the species disclosed in the specification is not enough and does not constitute 

m 

a representative number of species to describe the whole genus of any or all variants, 
recombinant and mutants of any or all polypeptides having serine protease activity 
isolated from any or all source, including any or all variants, recombinants and mutants 
thereof, and there is no evidence on the record of the relationship between the structure 



Application/Control Number; Page 1 1 

09/776.191 

Art Unit: 1652 

of the protease domain of the specific MTSPs disclosed in the specification and the 
structure of any or all recombinant, variant and mutant of any or all polypeptides having 
serine protease activity. Therefore, the specification fails to describe a representative 
species of the genus comprising any or all polypeptides having serine protease activity. 
Including any or all variants; recombinants and mutants thereof. 
Hence the rejection is maintained. 

Claims 1-3. 11, 19-20. 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112. first paragraph, because the specification, while being enabling for a 

« 

polypeptide consisting of amino acids 615-855 of SEQ ID NO:2, does not reasonably 
provide enablement for a polypeptide comprising any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof. The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731. 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary. (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples, (4) 
the nature of the invention. (5) the state of the prior art, (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art, and (8) the breadth of the 
claims. 



Application/Control Number: Page 12 

09/776,191 

Art Unit: 1652 

Claims 1-3. 11. 19^20, 35-36. 40-42 and 113-114 are drawn to a polypeptide 
consisting of a protease domain or catalytically active fragment thereof of a type-ll 
membrane-type serine protease (MTSP) from any source. Claims 11 and 34 limit the 
MTSP polypeptide to a MTSP1 polypeptide from any source. Therefore, these claims 
are drawn to polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 
protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways In which the proteins' structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
comprising amino acids 61 5-855 of SEQ ID NO:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance. 



Application/Control Number: Page 13 

09/776,191 

Art Unit: 1652 

working examples, and unpredictability of the art in predicting function fronn a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 
within a protein's sequence where amino acid modiflcations can be made with a 

■ 

reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-855 of SEQ ID NO:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 



Application/Control Number: Page 14 

09/776,191 
Art Unit; 1652 

Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 61 5-855 of SEQ ID N0:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 
characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731. 8 
USPQ2nd 1400 (Fed. Cir, 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the claims are enabled because the level of skill in the art 
is high and the specification teaches that MTSP polypeptides constitute a recognized 
well-known and well characterized family of serine protease and the specification 
describes the protease domain of a number of MTSP family members, such as 
conserved features of MTSP protease domains. Examiner respectfully disagrees. The 
scope of the claims, which are drawn to polypeptides comprising any protease domains 
or any or all catalytically active fragments of said protease domains of any or all MTSP 
or any or all MTSP1. including any or all recombinants, variants and mutants of said 
MTSP or MTSP1 , is not commensurate with the enablement provided by the disclosure 



Application/Control Number: Page 1 5 

09/776.191 

Art Unit: 1652 

with regard to the extremely large number of polypeptides comprising a protease or 
catalytically active domain broadly encompassed by the claims, Even though the 
structure of some MTSP are known, the claims are drawn to any or all serine domains 
and catalytically active fragments of any or all protease domains of any or all MTSP or 
MTSP1. As discussed above, predictability of which changes can be tolerated in a 
protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 
claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains. X-ray crystal structure and etc. such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants also argue that the claims are enabled because the knowledge, 
regarding MTSP proteins, of those skilled in the art is high. The Examiner respectfully 
disagrees. The claims are drawn to polypeptides comprising any protease domains or 
any or all catalytically active fragments of said protease domains of any or all MTSP or 
any or all MTSP1 . including any or all recombinants, variants and mutants of said MTSP 
or MTSP1. Since the amino acid sequence of the protein determines its structural and 
functional properties, predictability of which changes can be tolerated in a protein's 
amino acid sequence and obtain the desired activity requires a knowledge of and 
guidance with regard to which amino acids in the protein's sequence, if any. are tolerant 
of modification and which are conserved (i.e. expectedly intolerant to modification), and 



Application/Control Number: Page 16 

09/776,191 

Art Unit: 1652 

detailed knowledge of the ways in which the proteins* structure relates to its function. In 
addition, the art does not provide any teaching or guidance as to which amino acids 
within a serine protease can be modified and which ones are conserved such that one 
of skill in the art can make the recited polypeptides having serine protease activity and 
the general tolerance of serine proteases to structural modifications and the extent of 
such tolerance. The art clearly teaches that changes in a protein's amino acid 
sequence to obtain the desired activity without any guidance/knowledge as to which 
amino acids in a protein are required for that activity is highly unpredictable. At the time 
of the invention, there was a high level of unpredictability associated with altering a 
polypeptide sequence with an expectation that the polypeptide will maintain the desired 
activity. For example, Branden et al. (Introduction to Protein Structure, Garland 
Publishing Inc., New York, page 247, 1991) teach that (1) protein engineers are 
frequently surprised by the range of effects caused by single mutations that they hoped 
would change only one specific and simple property in enzymes, (2) the often surprising 
results obtained by experiments where single mutations are made reveal how little is 
known about the rules of protein stability, and (3) the difficulties in designing de novo 
stable proteins with specific functions. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides comprising catalytically active fragments 
of only MTSP1 , MTSP3, MTSP4 and MTSP6, but to any or all mutants, variants and 



Application/Control Number: 

09/776,191 

Art Unit: 1652 



Page 17 



recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP. conserved amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 



Claim Rejections - 35 USC § 102 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent In the United 
states. 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1-3 and 19-20 are rejected under 35 U.S.C. 102(b) as being anticipated 
by Dawson et al. 

Claims 1-3 arid 19-20 are drawn to a polypeptide consisting of a serine protease 
domain of MTSP or catalytically active fragments thereof. 



Application/Control Number: Page 18 

09/776.191 

Art Unit: 1652 

Dawson et al. (US Patent 5.465,833 -form PTO-892) discloses a polypeptide 
consisting of serine protease domain or a catalytically active fragment thereof of a 
MTSP protein, hepsin (Figure 1). Therefore, the reference of Dawson et al. anticipates 
claims 1-3 and 19-20. 

Claims 1-3, 11-13, 19-20, 34-36, 40-42 and 113-114 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. 

Claims 1-3. 11-13, 19-20 and 34 are drawn to a polypeptide comprising fragment 
consisting of a serine protease domain of MTSP having the characteristics recited in the 
claims. Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40 -42 and 113-1 14 
are drawn to a solid support comprising a polypeptide comprising a serine protease 
domain of MTSP. 

Takeuchi et al. (Reference IJ : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the Instant invention (page 1 1060, 2""^ full paragraph), 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 
615-855 of SEQ ID NO:2. confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). The MTSP of Takeuchi et al. is not expressed on normal 
endothelia cells (page 1 1054. last paragraph and page 1 1055. 2"^ full paragraph), is of 



Application/Control Number: Page 19 

09/776,191 

Art Unit: 1652 

human origin (Figure 1), consists essentially of the protease domain having catalytic 
activity (page 1 1060. 2"^ full paragraph), and is expressed in tumor cells (page 11055, 
top paragraph). 

Takeuchi et al. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 11055. 3'"^ full paragraph, page 11057. 4*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 11057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al. anticipates claims 1-3. 11-13. 19-20, 34-36. 40-42 and 113-114. 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21. 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because the instant claims are drawn to a polypeptide that consists of a protease 
domain or catalytically active portion thereof. Examiner respectfully disagrees. In 
addition to the full-length MT-SP1, Takeuchi et al. also discloses a purified activated 
protease domain, consisting of amino acids 615-855 of SEQ ID NO:2, confirmed by an 
N-terminal sequence of the purified, activated protease domain yielding the expected 
WGGT sequence (Figure 3 and right column on page 1 1057). Therefore, said 
purified, activated protease domain anticipates the instant claims. 



Application/Control Number: 

09/776.191 

Art Unit: 1652 



Page 20 



Applicants also argue that Takeuchi et al. does not anticipate the instant claims 
because the claimed polypeptide is a single chain polypeptide. Examiner respectfully 
disagrees. As discussed above. Takeuchi et al. discloses a purified activated protease 
domain, consisting of amino acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal 
sequence of the purified, activated protease domain yielding the expected WGGT 
sequence (Figure 3 and right column on page 11057). 
Hence the rejections are maintained. 



Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published undersection 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty In the English language. 

The following is a quotation of 35 U.S.C. 103(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 



Claims 1-3. 1 1-13 and 34 rejected under 35 U.S.C. 103(a) as obvious over 
O'Brien et al. 



Application/Control Number: Page 21 

09/776.191 

Art Unit: 1652 

Claims 1-3, 11-13 and 34 are drawn to a polypeptide comprising a serine 
protease domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972.616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID N0:2 of the instant 
invention (SEQ ID N0:2. columns 19-24). O'Brien et al. teaches a serine protease 
domain having proteolytic activity that is 100% identical to amino acids 615-855 of SEQ 
ID NO:2 (Figure 2. Figure 10 and SEQ ID NO:14). Further, O'Brien et al. teaches a 
method of expressing polypeptides via a vector in host cells. O'Brien et al. also teaches 
that the protease domain could be released the used as a diagnostic which has the 
potential for a target for therapeutic intervention (Column 15, lines 35-38). Therefore, it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to express the protease domain of SQ ID NO:14 and purify the polypeptide. 
The motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Therefore, the above reference renders claims 1-3, 11-13 and 34 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 



Application/Control Number: Page 22 

09/776,191 

Art Unit: 1652 

Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide! Examiner 
respectfully disagrees. A single chain polypeptide is one sequence of amino acids 
beginning with a carboxyl end and terminating with an amino end. wherein the amino 
acids are connected via peptide bonds. Therefore, the protease domain obtained from 
O'Brien et al. can be construed as a single chain polypeptide. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain. Therefore, It would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SO ID NO: 14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Applicants again argue that at the time of filing the instant application, one of skill 
in the art would not have had a reasonable expectation of success to express the 
protease domain because art evidences that a single-chained polypeptide would not 



Application/Control Number: Page 23 

09/776,191 

Art Unit: 1652 

have been expected to have protease activity. Examiner respectfully disagrees. The 
claims are drawn to a polypeptide comprising a fragment consisting of a protease 
domain of SEQ ID NO:2. Therefore, said polypeptide being a single-chained 

polypeptide is an inherence property of said polypeptide since tNO polypeptides having 

« 

identical structure will have identical function and physical and chemical properties. 
Hence the rejections are maintained. 

Claims 35-36. 40-42 and 113-114 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 
serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 15. lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID NO:2 
(Column 9. lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9, lines 54-55) and conjugated to bridging molecules (Column 6, 



Application/Control Number: Page 24 

09/776,191 

Art Unit: 1652 

lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 prima facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the teachings of O'Brien et al. does not result in the 
instantly claimed compositions because O'Brien et al. does not teach or suggest a 
single chain polypeptide that includes a MTSP protease domain where the polypeptide 



Application/Control Number: Page 25 

09/776.191 

Art Unit; 1652 

does not include any additional MTSP portions and the polypeptide has serine protease 
activity. O'Brien et al. does teach or suggest a single chain polypeptide comprising a 
MTSP portion, wherein the MTSP portion is a protease domain and wherein the MTSP 
portion has serine protease activity and wherein the MTSP portion is the only portion of 
the polypeptide because O'Brien et al. identifies the serine protease domain and one 
having ordinary skill in the art at the time the invention was filed would have been 
motivated to purify the serine protease domain of O'Brien et al. as discussed above. 

Hence the rejection is maintained. 

Claims 19-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in view of Takeuchi et al. 

Claims 19-20 are drawn to a polypeptide comprising the serine protease domain 
of a MTSP wherein free Cys residues are substituted with Ser residues. 

O'Brien et al. teaches a serine protease domain of a MTSP polypeptide, as 
discussed above. 

The reference of O'Brien et al. does not teach a serine protease domain of a 
MTPSP polypeptides wherein free Cys residues have been replaced with Ser residues. 

It is well known in the art that proteins form disulfide bonds via the SH groups of 
Cys residues. Upon making a polypeptide comprising a serine protease domain, a Cys 
residue which normally forms disulfide bonds in the full length polypeptide may be left 
free. For example. Takeuchi et al. (Reference IJ : PTO-1449) teaches that Cysteine at 



Application/Control Number: Page 26 

09/776.191 

Art Unit: 1652 

position 731 of SEQ ID NO:2 normally forms a disulfide bond with a Cys residue in the 
pro-protease domain (see page 1 1060, top left paragraph and Figures 1 and 2). 

Cys residues are sensitive to oxidation due to their SH side group. Estell et al. 
(U.S. Patent No. 5,346.823) teaches that Cys residues replaced with Ser residues to 
decrease a polypeptide's susceptibility to oxidation (Abstract and Column 10. lines 34- 
38). Ser residues have sihnilar side chains as Cys residues and substitution of a Cys 
residue with a Ser residue is a conservative substitution. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to replace free Cys residues in the protease 
domain taught by O'Brien et al. with a Ser residue. One of ordinary skill in the art would 
be motivated to make such a change in order to enhance stability of the polypeptide. 
One of ordinary skill in the art would have had a reasonable expectation of success 
since Estell et al. teaches successful decrease of a protein's susceptibility to oxidation 
by substituting residues sensitive to oxidation with conservative substitutions. 

Therefore, the above references render claims 1 and 16. 18-20, 34 and 137 
prima facie obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the combination of the teachings of O'Brien et al. with 
the teachings of Estell et al.. and Takeuchi et al. does not result in the instantly claimed 
methods because O'Brien et al. does not teach or suggest a single chain polypeptide 
that includes a MTSP protease domain where the polypeptide does not include any 



Application/Control Number: Page 27 

09/776,191 

Art Unit: 1652 

additional MTSP portions and the polypeptide has serine protease activity and that 
neither Takeuchi et al. nor Estell et al. remedy the defects of O'Brien et al. First, the 
claims are product claims and not method claims. Second, O'Brien et al. does teach or 
suggest a single chain polypeptide comprising a MTSP portion, wherein the MTSP 
portion is a protease domain and wherein the MTSP portion has serine protease activity 
and wherein the MTSP portion is the only portion of the polypeptide because O'Brien et 
al. identifies the serine protease domain and one having ordinary skill in the art at the 
time the invention was filed would have been motivated to purify the serine protease 
domain of O'Brien et al. as discussed above. 

Applicants argue that Takeuchi et al. teaches that every cysteine residue of the 
protein is disulfide bonded and therefore Takeuchi eta I. does not teach or suggest an 
MTSP protease domain having a free Cys residue. Examiner respectfully disagrees. 
Figure 4 applicants are referring to illustrate disulfide bonds of cysteine residues of the 
full length MTSP, for example, the Cys at position 830 is disulfide bonded to Cys at 
position 191. 

Hence the rejection is maintained. 

None of the claims are in condition for allowance. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 571-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor. Ponnathapu Achutamurthy can be reached on 571-272-0928. The fax 
phone number for the organization where this application or proceeding is assigned is 
571-273-8300. 



Application/Control Number: Page 28 

09/776.191 

Art Unit: 1652 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 
1600. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll free). 



Yong D. Pak 

Patent Examiner 1652 



Notice of References Cited 


Application/Control No. 
09/776.191 


Applicant(s)/Patent Under 
Reexamination 
MADISON ET AL. 


Examiner 
Yong D. Pak 


Art Unit 
1652 


Page 1 of 1 



U.S. PATENT DOCUMENTS 



•k 




Document Number 
Country Code-Number-Kind Code 


Date 
MI^-YYYY 


Name 


Classtfication 


* 


A 


US-5.645,833 


07-1997 


Dawson et al. 


424/94.64 




□ 


US- 












US- 










n 


US- 










c 


US- 










F 


US- 










G 


US- 










H 


US- 










1 


US- 










J 


US- 










K 


US- 










L 


US- 










M 


US- 








FOREIGN PATENT DOCUMENTS 






Oocument Number 
Country Code-Number-Kind Code 


Date 
MM-YYYY 


Country 


Name 


Classification 




N 














O 














P 














Q 














R 














S 














T 












NON-PATENT DOCUMENTS 






Include as applicable: Author, Title Date. Publisher. Edition or Volume. Pertinent Pages) 




U 


Branden et al. Introduction to Protein Structure, Garland Publishing Inc., New York, page 247, 1991 




V 






w 






X 




•A copy of th 


3 reference is not being furnished with this Ofnce action. (See MPEP § 707.05(a).) 



Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. 



U.S. Potent and Tredomarlc Office 

PTO-892 (Rev. 01-2001) 



Notice of References Cited 



Part of Paper No. 20070104 



r 



THE COVER 

Front The background photograph of the cover Is of a Laue x-ray diffraction 
pattern produced by a crystal of the plant enzyme ribulose bUphosphatc 
.carboxylase. This technique Is described In Chapter 17. Information derived 
from such x-ray patterns/ together with a knowledge of the amino acid 
sequence, enabled the three-dlnicnslonal arrangement of atoms in the protein 
to be determined. A simplified representation of this protein structure Is shown 
In color, superimposed on the diffraction pattern. The enzyme, which Is 
Involved In the fixation of cartx>n dioxide, is a member of the large class of 
a/p barrel protein structures. Thb class of structures is discussed in detail In 
Chapter 4. 

Back: Tomato bushy stunt virus Is a spherical virus made from 180 protein 
subunits. Arms extending from sixty of these subunlu contribute to an Internal 
framework that determines the size of the correcUy assembled vlruj particle. The 
Interdlgitated arms from three subunlu meet at each of the twenty Icosahedral 
threefold axes of the virus. One such axis Is shown here with the P strands from 
three subunits shown in different shades of green. Virus structure Is described 
in more detail in Chapter 11. 



<d 1991 Carl Dranden and John Tooze 



All rlghu reserved. No part of this book covered by the 
copyright hereon may be reproduced or used in any form or 
by any means — graphic, electronic, or mechanical, including 
photocopying, recording, taping, or Information storage and 
retrieval systems — without permbslon of the publisher. 



Library of Congress Cataloging*ln-Publicatioii Data 
Branden, Carl. 

Introduction to protein structure / Carl Branden, John Tooze. 
p. cm. 
Includes index. 

ISBN 0-8153-0344-0 ~ ISBN 0-8153-0270-3 (pbk.) 
1. Proteins— Structure. I. Tooze, John. 11. Title, 
QF551.B7635 1991 
/ 574.19'245— dcZO 91-11788. 

CIP 

Published by Garland Publbhing, Inc. 

136 Madison Ave., New York, New York, 10016 

Printed In the United States of America 

15 14 13 12 11 10 9 8 7 6.5 4 3 2 1 



prediction, Engineering, 
< and Design of Protein 
Structures 



> 



Over a period of moie than 3 bUllon years a laige variety of protein molecules 
^ has evolved to run the complex machinery of present-day cejls and organisms. 
i Most of us believe that these molecules have evolved by random muutlon of 
genes and natural selection for those gene products that have conferred some 

> functional advantage contributing to the survival of Individual organisms. 

,j Long before Darwin and Wallace proposed the theory of evolution and 
^ Mendel discovered the laws of genetics, plant and animal breeders had begun 
h to Interfere with the process of evolution In the speties that gave rise to 
r dQmestlcated animals and cultivated plants. Considering their total lack of 
knowledge of both evolutionary theory and genetia, their achievements, 
):* brought about by forcing the pace of and subverting natural selection, were 
r Impressive albeit very gradual. With the advent of molecular genetics and In 
particular techniques for gene cloning and gene Insertion, we arc now entering 
: an era of genetic exploitation of other organisms .undreamed of only 50 years 
' ago. We can now t>egir) to design genes to produce In other organisms novel 
gene products for the benefit of human beings; we are no longer restricted to 
. selecting uj^eful genes that arise by muutlon. We . are, however, only at the 

> beginning of this new era, and so far we have only scratched the surface of the 
knowledge that is required for true engineering and design of protein molecules. 

' We distinguish protein engineering, by which we mean mutating the gene 
of an existing protein In an attempt to alter Its function in a predictable wy, from 
' protein design, which has the m.ore ambitious goal of designing de novo a 
. protein to fulfill a desired function. 

Protein engineers frequently have been surprised by the range of effects 
caused by single mutatloru that they hoped would change only one specific and 
. simple property In enzyme; some exarriples are descrll>ed In Chapter IS. The \ 
often surprising results of such experiments reveal how Uttle we know at>out the 
rules of protein stability and the energetics of llgand binding and catalytic 
. efficiency; they also serve to emphasize how difficult it Is to design denovo$tab]e 
proteins with specific functions. However, by using the methods of engineering 
; and design, we are now at least inaeaslng rapidly our basic knowledge of the 
function of protein molecules. For example, we now know that the difference 
* in energetic terms between the trarislUon states of a naturally evolved useful 
erxzyme and an engineered useless mutant corresponds to less than the energy 
Qf a single hydrogen bond, even for such Important life-sustaining eiyzyims as 
the COj-flxlng enryme In green plants, rublsco (rlbulose-l,5-blsphosphate 
carboxylase/oxygenase). 

Knowledge of a protein's tertiary structure is a prerequisite for the proper 
engineering of its function. ITrifortunately, inspiteof recent slgnlQcant techno- 





Exhibit 3 



XP-0021698'v 



Proc Nat!. Acad, ScL VSA 

Vol. 96, pp. 11054-11061. Sepicmbcr 1999 

Colloquiuin Paper 




This paper was presented at the National Academy of Sciences colloquium 'Troteolytlc Processing and Physiological 
f^n^nJZld February 20-21, 199% at the Arnold and Mabel Beckman Center m Irvine. CA, 



Reverse biochemistry: Use of macromolecular PJ^?^^^^"^ 
to dissect complex biological processes and identify a membrane- 
type serine protease in epitheUal cancer and normal tissue 

TOSHIHIKO TaKEUCHI*. MaRC a. SHUMANt, AND CHARUES S. CRAJK** 

.Dcp™u Of Phann^uUC Ccm^uy and Biochcn.i..y A Biophysics, and n>epart.en, of Medicine. Univcnity of Ca,ifon.l. San Fra«isco. CA 



ABSTRACT Serine proteases of the chymotrypsin fold 
are of great interest because they provide detailed under- 
standing of their enzymatic properties and their proposed r^e 
in a number of physiological and pathological processes. We 
have been developing the macromolecular inhibitor ccotin to 
be a "fold-specific" inhibitor that is selective for members of 
the chymotrypsin-fold dass of proteases. Inhibition of pro- 
tease activity through the use of wild-type and engineered 
ecotins results in inhibition of rat prostate differentiation and 
retardation of the growth of human PC-3 prostatic cancer 
tumors. In an effort to identify the proteases that may be 
involved in these processes, reverse transcription-PCR wiU) 
PC-3 poly(A)+ mRNA was performed by using degenerate 
oligonucleotide primers. These primers were designed by 
using conserved protein sequences unique to chymotrypsiii- 
fold serine proteases. Five proteases were identiried: urota- 
nase-type plasminogen activator, factor XH, protein 
trypsinogen IV, and a protease that we refer to as membrane, 
type serine protease 1 (MT-SPl). The cloning and character- 
ization of the MT-SPl cDNA shows that it encodes a mosaic 
protein that contains a transmembrane signal anchor, two 
CUB domains, four LDLR repeats, and a s«7'"«,P^S!^J' 
domain. Northern blotting shows broad expression of Ml -bPI 
in a variety of epithelial tissues with high levels of expression 
in the human gastrointestinal tract and the prostate. A 
His-lagged fusion of the MT-SPl protease domain was ex- 
pressed in Escherichia coii, purified, and au t cacti vated. Ecotin 
and variant ecotins are subnanomolar inhibitors of the M J- 
SPl activated protease domain, suggesting a possible role for 
MT-SPl in prostate differentiation and the growth of pros- 
tatic carcinomas. 

Serine proteases possessing a chymotrypsin fold are of great 
interest because they provide detailed understanding of their 
enzymatic properties and their proposed role in a number of 
physiological and pathological processes. A wealth of infor- 
mation exists on structure-function relationships regarding 
this large class of enzymes. Moreover, potent and specjlic 
inhibitors are readily available for use in dissecting the function 
of these enzymes. These proteases exist as precursors that are 
activated by specific and limited proteolysis, allowing regula- 
tion of enzyme activity (1). Examples of this type of regulation 
include blood coagulation (2), fibrinolysis (3), complement 
activation (4). and trypsinogen activation by enteropepiidase 
in digestion (5). The precise control of these activation pro- 
cesses is crucial for normal physiological enzymatic function; 
misrcgulaiion of these enzymes can lead to pathological con- 
ditions (2-5). , , . , 

We are interested in studying the role of these chymotryp; 
sin-fold serine proteases in cancer by using a "fold-specitic 

PNAS is available online ai www.pnas.org. 



inhibitor, ecotin (6, 7). Ecotin or engineered versions of ecotin 
can be introduced into complex biological systems as probes of 
proteolysis by these chymotrypsin-fold proteases. If effects are 
observed on treatment with these unique inhibitors, then the 
larce body of knowledge concerning the biochemistry of these 
proteases can be tapped to understand the structure and 
function of the target proteases. For example, the molecular 
cloning, structural modeling, and mechanistic understanding 
of the enzymes are immediately accessible. We refer to this 
approach, which is analogous to "reverse genetics." as reverse 
biochemistry." and we have applied it to idenufication of 
specific serine proteases in prostate ^"^er. 

Urokinase-lype plasminogen activator (uPA) has been im- 
plicated in tumor-cell invasion and metastasis. Cancer-cell 
invasion into normal tissue can be facilitated by uPA through 
its activation of plasminogen, which degrades the basement 
membrane and extracellular matrbc (reviewed in refs, 8 and 9). 
The role of other serine proteases in cancer has been less well 

characterized. , . . 

One useful model system for studying many issues that are 
pertinent to prostate cancer is the development of the rodent 
ventral prostate in cxplant cultures. Macromolecular inhibitors 
of serine proteases of the chymotrypsin fold, ecotin ^d ccotm 
M84R/M85R (6, 7). inhibit ductal branching morphogenesis 
and differentiation of the explanted rat ventral prostate (F. 
Elfman. T.T., C.C. G. Cunha. and M.S., unpublished data) 
Ecotin M84R/M85R is a 2,800-fold more potent inhibitor of 
uPA than ecotin (1 nM vs. 2.8 >.M) (6). However, inhibition of 
prostate differentiation was seen with both mhibitors. suggest- 
ing that uPA and other related serine proteases are involved m 
the differentiation and continued growth of the rat ventral 
prostate. Thus, unidentified serine proteases may play a role in 
growth and prevention of apoptosis in prostate epithelial cells 

in this system. . . j t^^^ 

Another well characterized model that is derived from 
human prostate cancer epitheUal cells is the PC-S cell line (10). 
The PC-3 cell line expresses uPA as assayed by EUSA and by 
Northern blotting of PC-S mRNA (11). We found that the 
primary tumor size in PC-3-im pi anted nude mice was signif- 
icantly smaller in both ecotin M84R/M85R and ecotm wild- 
type treated mice treated for 7 weeks compared with the 
primary tumor size of PBS-treated mice. Metastasis from the 
primary tumors were similarly lower in the inhibuor-treaied 

Abbreviations: MTSPl. mcmbrane-iypc serine Pjox^c J: 
implement factor iR-urchin embryonic growth f«tor-6onc morpho- 
gcnciic protein; LDLR. low density lipoprotein receptor; uPA uroki- 
nasc-typc plasminogen activator; pNA, p-nitroanilidc. 
D^ta dteposition: -Rie sequences reported in ^h.s paper hav^^^ 
dcpositcdin the GcnBank database (accession nos. Banklt257050 and 

r^^S' reprini requests should be addressed. E-mail: craik® 
cgl.ucsf.edu. 



U054 



Colloquium Paper: Takeuchi ei al. • 

mice than in PBS-ireaied mice (O. Melnyk, T.T., C.C, and 
M.S.. unpublished data). Inhibition was not unexpected with 
ecotin M84R/M85R treatment, because uPA has been impli- 
cated in melaslasis. However, wild-lype ecolin is a poor, 
micromolar inhibitor of uPA; one interpretation of the data is 
that the decrease in tumor size and metastasis in the mouse 
model involves the inhibition of additional serine proteases. 
Thus, identification of the serine proteases expressed by PC-3 
prostate cells may provide insight into the role of these 
proteases in cancer and prostate growth and development. In 
this report we have extended the strategy of usmg PGR with 
degenerate oligonucleotide primers that were designed to' 
using conserved sequence homology (12-14) to identify addi- 
tional serine proteases made by cancer cells. Five mdependcnt 
serine protease cDNAs derived from PC-3 mRNA were se- 
quenced, including a novel serine protease, which we refer to 
as membrane-type serine protease 1 (MT-SPl), and the clon- 
ing and characterization of this cDNA that encodes a mosaic, 
transmembrane protease is reported. 

MATERIALS AND METHODS 

Materials. All primers used were synthesized on a Applied 
Biosystems 391 DNA synthesizer. All restriction enzymes were 
purchased from New England Biolabs. Automated PN A se- 
quencing was carried out on an Applied Biosystems 377 Prism 
sequencer, and manual DNA sequencing was carried out under 
standard conditions. N-terminal amino acid sequencing was 
performed on an ABI 477A by the University of California. 
San Francisco Biomolecular Resource Center. The synthetic 
substrates, Suc-AAPX-p-nitroanilide (pNA). [N-succinyl- 
alanyl-alanyl-prolyl-Xxx-pNA (Xxx = alanyl, asparty . glu- 
tamyl, phenylalanyl. leucinyl. meihionyl. or arginyl)]. and 
H-Arg-pNA. (arginyl-pNA), were purchased from Bachem. 
Deglycosylaiion was performed by usmg PNGase F (NEB, 
Beverly, MA). Al) other reagents were of the highest quality 
available and purchased from Sigma or Fisher unless otherwise 

"°uiation of cDNA from PC-3 Cells. mRNA >^as isolated 
from PC-3 cells by using the polyATtract System 1000 kit 
(Promega). Reverse transcription was primed by using the 
' lock-dScking" oligo(dT) primer (15). Supcrecripi 11 reverse 
transcriptase (Life Technologies. Grand Island. NY) was used 
in accordance with the manufacturer's instructions to synthe- 
size the cDNA from the PC-3 mRNA. 

Amplification of MT-SPl Gene. The degenerate primers 
used for amplifying the protease domains were designed from 
the consensus sequences flanking the catalytic hisiidme (5 
His-primer) and the catalytic serine (3' Ser-primcr), similar to 
those described (12). The 5' primer used is as ^^llow^ 5 -TGG 
(AG)Tl (CAG)TI (AT)(GC)I GCI (GA)CI CA(Cr) TG-3 . 
where nucleotides in parentheses represent equimolar mix- 
tures and I represents deoxyinosine. This pnmer cncodes^^ 
least the following amino acid sequence: AV (I/V) i^f^ /^y{J:p^) 
(Sn^ A (A/T) H C. The 3' primer used is as follows: 5 -lOO 
ICCJCC^CKAT) (AG)TC ICC (CT)TL (GA)CA IG(ATC) 
(G A)TC-3'. The reverse complement of the 3' P"")^^,^"5£.°" 
at least the following amino acid, sequence: D (A/b/1) C 
(K/E/Q/H) G D S G G P. ^ 

Direct amplification of serine protease cDNA was not 
possible by using the above primers. Instead, the first PGR was 
performed with the 5' His-primer and the ohgo(dT) primer 
described above, by using the "touchdown" PCR protocol (16). 
with annealing temperatures decreasing from 52 C to C 
over 22 rounds and 13 final rounds at 54"C annealing temper- 
ature. Cycle times were 1 min (denaturing). 1 mm (annealing), 
and 2 min (extension) and were followed by one final extension 
lime of 15 min after the final round of PGR. The template for 
the second PCR was 0.5 /tl (total reaction volume 50 ^lL) of 
a 110 dilution of the first PCR mixture that was performed 



Proo Nazi Acad. ScL USA 96 (1999) 11055 .. 

■ 

with the 5' His-primer and the oligo(dT). The second PCR 
reaction was primed with the 5' His- and the 3' Ser-primers and 
performed by using the touchdown protocol described above. 
All PCRs used 12.5 pmol of primer for 50-;jtl reaction volume. 

The product of the second reaction was purified on a 2% 
agarose gel, and all products between 400 and 550 bp were cut 
from the gel and extracted by using the QIAquick gel extrac- 
tion kit (Qiagen. Chatsworlh. CA). These products were 
digested with the BomHI restriction enzyme to cut any uPA 
cDNA. and all 400- to 500-bp fragments were rcpurified on a 
2% agarose gel. These reaction products were subjected to a 
third PCR by using the 5' His-primer and the 3' Scr-prlmer by 
using the identical touchdown procedure. These reaction 
products were gel-purified and directly cloned into the 
pPCR2 1 vector by using the TOPO TA ligation kit (Inviiro- 
gen) DNA sequencing of the inserts determined the cDNA 
sequence from nucleotides 1,984 to 2,460 (see Fig. 1). 

Northern Blot Analysis, ^^p.iabeled nucleotides were pur- 
chased from Amersham Pharmacia. A cDNA fragment con- 
taining nucleotides 1.173-2.510 was digested from expressed 
sequence lag w39209 by using restriction enzymes EcoRI and 
B5mbl. yielding a 1.3-kilobase nucleotide insert. Labeled 
cDNA probes were synthesized by using the Redipnme ran- 
dom primer labeling kit (Amersham Pharmacia) and 20 ng of 
the purified insert. Poly(A)+ RNA '"^'^J^^a""/^''^^?^^*}!^ 
blotting were purchased from Origene (Rockville, MD; HB- 
1002 HB-IOIS) and CLONTCCH (Human 11 7759-1. Human 
Cancier Cell Line 7757). The blots were performed under 
stringent annealing conditions as described in ref. 17. 

Construction of Expr^sion Vectors. The mature protease 
domain and a small portion of the pro-domain (nucleotides 
1 822-2.601) cDNA were amplified by usmg PCR from ex- 
pressed sequence tag w39209 and ligated mto the pQE30 
vector (Qiacen). This construct is designed to overexpress the 
pSe Suence from amino acids (-0 5%.855 with the 
foUowing fusion: Met-Arg-Gly.Ser-His.-aa596-855 The Hiv 
tag fusion allows affinity purification by using meial-chelaie 
chromatography. The change from Ser-805, encoded by TCC 
trAMGCT? was performed by using PCR. The presence of 
the correct Ser Ala substitution in the pQE30 vector was 
verified by DNA sequence analysis. ^ - -m. 

Expression and Purirication of the Protease Domain. The 
above-mentioned plasmids were separately transformed into 
Escherichia coli X-90 to afford high-level expression of recom- 
binant protease gene products (18). Expression and purifica- 
tion of the recombinant enzyme from solubilized mclusion 
bodies was performed as described (19). Protem containing 
fractions were pooled and dialyzed overnight at 4 C against 50 
mM Tris (pH 8). 10% glycerol, 1 mM 2-mercaptoeihanol, and 
3 M urea Auioaciivation of the protease was monitored on 
dialysis against storage buffer (50 mM Tris, pH 8/10% glyc- 
erol) at 4^C by using the substrate Specirozyme tPA (hexahy- 
drotyrosyl-Gly-Arg-pNA, American Diagnostica, Greenwich. 
CT): Hydrolysis of Spectrozyme tPA was monitored at 405 nM 
for the fonnation of p-nilroaniline by using a Uvikon 860 
spectrophotometer. Activated protease was bound to an im- 
mobilized p-aminobenzamidine resin (Pierce) that had been 
equilibrated with storage buffer. Bound protease was eluted 
with 100 mM benzamidine and the protein containing frac- 
tions were pooled. Excess benzamidine was removed by using 
FPLC with a Superdex 70 (Amersham Pharmacia) gel filtra- 
tion column that was equilibrated with storage buffer Protein 
containing fractions were pooled and stored at -80 C. The 
cleavaae of the purified Ser«»5Ala protease domain was per- 
formed at 37*'C by addition of active recombinant protease 
domain to 10 nM. Cleavage was monitored by using SDS/ 

^ISfermination of Substrate Kinetics. The purified serine 
protease domain was titrated with ^-meihylumbelliferyl p- 
guanidinobenzoate (MUGB) to obtain an accurate concen- 



1 1056 Colloquium Paper: Takeuchi et ai 



Proc Nail. Acad. Sci. USA 96 (1999) 



1 c=«»A-nxB^c»cccac=«««cciw<«.^4B^^ 

so. ^«^x^n^«=^c?c*S,,iix^c^^ lit 
a.. LcL=?oc§ocLcLc&«^n=^ lit 

.801 Gaa*oc*awnCT*cc«cc«nc*<»Ti»GJ^^ d e g b w p h 62s 
„o. ^,x=;^;c^«;^cc?^^^^ Ill" 



2760 
2880 



P.. ,. Nucleotide sequence of ,be cDNA encoding ^^^^^^'J;^;^^',,'::^^^ sCVyT^TundTrrnSp ^^^^^^^ 
acid residue. Amino acids are shown in s.ngle-letler code. The «™'"»«'°" ^0^" " ^ rfined at nucleotide 32. The predicted 

underlined. The catalytic triad in the serine protease domain .s highlighted: His-656. Asp-711, and Ser HU:). 



iration of enzyme active sites (20). Enzyme activity was 
monitored at 25**C in assay buffer containing 50 mM Tris (pH 
8.8). 50 mM NaCl, and 0.01% Tween 20. The final concen- 
tration of substrate Speclrozyme tPA ranged from 1 to 400 
PlM. Enzyme concentrations ranged from 40 to 800 pM. 
Active-site titrations were performed on a Fluoromax-2 spec- 
irofluorimeier. Measurements were plotted by using the 
KAUEiDAGRAPH program (Synergy Software, Reading, PA), 
and the /C„„ k^u and /Cca./Km for Spectrozyme tPA was 
determined by using the Michaelis-Menten equation. 

Inhibition of MT-SPl Protease Domain with Ecotin and 
Ecotin M84R/M85R. Ecotin and ecotin M84R/M85R were 
purified from £ coU as described (6). Various concentrations 
of ecotin or ecotin M84R/M85R were incubated with the 
His-tagged serine protease domain in a total volume of 990 
of buffer containing 50 mM NaCl. 50 mM Tris-HCI (pH 8.8). 
and 0.01% Tween 20. Ten microliters of Spectrozyme tPA was 
added, yielding a solution containing 100 p.M substrate. The 
final enzyme concentration was 63 pM. and the ecotin and 
ecotin M84R/M85R concentration ranged from 0.1 to 50 nM. 
The data were fit to the equation derived for kinetics of 
reversible tight-binding inhibitors (21. 22). and the values for 
apparent Ki were determined. 

RESULTS 

Cloning of Serine Protease Domain cDNAs from PC-3 Cells 
and AmpHfication of MT-SPl cDNA. PCR amplification of 
serine protease cDNA was performed by using "consensus 



cloning" where the amplification was performed with degen- 
erate primers designed to anneal to cDNA encoding the region 
about the conserved catalytic histidine (5' His-pnmer) and the 
conserved catalytic serine (3' Ser-primer). The consensus 
primers were designed by using 37 human sequences within a 
sequence alignment of 242 serine proteases of the chymotryp- 
sin fold that are reported in the SwissProt database. To bias the 
screen for previously unidentified proteases in the PC-3 
cDNA uPA cDNA was cut and removed by using the known 
BamHi endonuciease site in the uPA cDNA sequence. The 
expected size of the cDNA fragments amplified between 
HU-57 and Ser-195 cDNA (standard chymotrypsmogen num- 
bering) is between 400 and 550 bp; statistically, only 1 in 10 
cDNAs of that length will be cleaved by BamHl. Thus. cDN As 
obtained from the PCR reactions with the 5' His-pnmer and 
3' Ser-primer were size selected for the 400- lo 550-bp range, 
digested with BamHl. and purified from any digested cDNAs. 
After a subsequent round of PCR. the products were cloned 
into pPCR2.1 (Fig. 2). Twenty clones were digested with 
£coRI to monitor the size of the cDNA insert. Three clones 
lacked inserts of the correct size. The remaining 17 clones 
containing inserts between 400 and 550 bp were sequenced. 
BLAST searches of the resulting sequences revealed that six 
clones did not match serine protease sequences. The remainmg 
cDNAs yielded clones corresponding to factor XII (two 
clones), protein C (two clones), trypsinogen type IV (two 
clones). uPA (one clone), and MT-SPl (four clones). Addi- 
tional serine protease sequences may not have been found 
because they were digested by BamlU, lost in the size selection, 
or present in lower frequencies. 



Colloquium Paper: Takeuchi et ai 



1000 bp 



500 bp 

400 bp 
300 bp 




Fig. 2. 1-anc 1 shews the PGR products obtained by using degen- 
erate primcR designed from the consensus sequences flanking the 
catalytic histidinc (5' His-primer) and the catalytic serine (3' Ser- 
primcr). The products remaining between 400 and 550- bp after 
digestion with BamHl were rcamplified by using the same degenerate 
primers. The products from this second PGR arc shown in Lane 2. 

Multiple expressed sequence tag sequences were found for 
the cDNA. Expressed sequence tag accessions aa459076, 
aa219372. and w39209 were used extensively for sequencing 
the cDNA starting from nucleotide 746 and 2,461-3,142, but 
no start codon was observed. A sequence was also found in 
GenBank (accession no. U20428). This sequence also lacks the 
5' end of the cDNA but allowed amplification of cDNA from 
nucleotides 196-745. Rapid amplification of cDNA ends 
(RACE) (23) was used to obtain further 5' cDNA sequence. 
Application of RACE did not yield a clone containing the 
entire 5 '-untranslated region, but the sequence obtained con- 
tained a stop codon in-frame with the Kozak start sequence 
(24), giving confidence that the full coding sequence of the 
cDNA has been obtained. The nucleotide sequence and pre- 
dicted amino acid sequence are shown in Fig. 1. 

The nucleotide sequence surrounding the proposed start 
codon matches the optimal sequence of ACCATGG for 
translation initiation sites proposed by Kozak (24). In addition, 
there is a stop codon in-frame with the putative start codon, 
which gives further evidence that initiation occurs at that site. 
The DNA sequence predicts an 855-aa mosaic protein com- 
posed of multiple domains (Fig. 3). The coding sequence does 
not contain a typical signal peptide but does contain a smgle 



1.MT-SP1 



SA CUB CUB LLLL 

a.Enleropeptidase 



SP 




SA L MAM CUB L MSCR 

SA SP 

100 AA 



Fig. 3. The domain structure of human MT-SPl is compared with 
the domain struaurc of cnieropcpiidase (47) and hepsin (25). SA. 
possible signal anchor; CUB. a repeat first identified in complement 
components Clr and Cls, the urchin embryonic growth factor and 
bone morphogeneiic protein 1 (27); U LDLR repeat (29); SP, a 
chymotrypsin family serine protease domain (40); MAM, a domain 
homologous to member? of a family defined by meprin. protein A5, 
and the protein tyrosine phosphatase ^ (48); MSCR. a macrophage 
scavenger receptor cysieinc-rich motif (29). The predicted disulfide 
linkages arc shown labeled as C-C 



Proc Natl. Acad. Sci, USA 96 (J999) 11057 

« 

hydrophobic sequence of 26 residues (residues 55-81), which 
is flanked by a charged residue on each side. This sequence 
may constitute a signal anchor sequence, similar' to that 
observed in other proteases, including hepsin (25) and en- 
teropeptidase (26). Following the putative signal anchor se- 
quence are two complement factor IR-wrchin embryonic 
growth factor-i>one morphogeneiic protein (CUB) domains 
(27), which are named after the proteins in which the modules 
were first discovered: complement subcomponents Cls and 
Clr. urchin embryonic growth factor (Uegf). and bone mor- 
phogeneiic protein 1 (BMPl). CUB domains have conserved 
characteristics, which include the presence of four cysteine 
residues and various conserved hydrophobic and aromatic 
positions (27). The CUB domain, which has recently been 
characterized crystal lographically (28), consists of 10 ^-st^ands 
that are organized into two 5-stranded 3-sheeis. Following the 
CUB domains are four low-density lipoprotein receptor 
(LDLR) repeats (29), which are named after the receptor 
ligand-binding repeats that are present in the LDLR. These 
repeats have a highly conserved pattern and spacing of sue 
cysteine residues that form three intramolecular disulfide 
bonds. The final domain observed is the serine protease 
domain. The alignments of these domains with other members 
of their respective classes are shown in Fig. 4. 

Tissue Distribution of MT-SPl mRNA. Northern blots of 
human poly(A)+ RNA, made by using a 1 Jkilobase fragment 
of MT-SPl cDNA fragment as a probe, show a ~3.3-kilobase 
fragment appearing in epithelial tissues including the prostate, 
kidney, lung, small intestine, stomach, colon, and placenta, as 
well as other tissues, including spleen. liver, leukocytes, and 
thymus. This band was not observed in muscle, brain, ovary, or 
testis (Fig. 5). Similar experiments performed on a human 
cancer cell line blot shows that MT-SPl is expressed in the 
colorectal adenocarcinoma. SW480, but was not observed in 
the promyelocytic leukemia HI^60, HeLa cell S3, chronic 
myelogenous leukemia K-562, lymphoblastic leukemia 
MOLT-4. Burkitl's lymphoma Raji, lung carcinoma A549, or 
melanoma G361 lanes (data not shown). This 3.3-kilobase 
mRNA fragment is slightly longer than the 3.1-kilobase se- 
quence presented in Fig. 5, suggesting that there may still be 
sequence in the 5'-untranslated region that has not been 

identified. „ „ ^ 

Activation and Purification of His-MT-SPl Protease Do- 
main. The serine protease domain of MT-SPl was expressed 
in E. coli as a His-tagged fusion and was purified from inclusion 
bodies under denaturing conditions by using metal-chelaie 
affinity chromatography. The yield of enzyme after this step 
was «3 mg of protein per liter of E. coli culture. This denatured 
protein refolded when the urea was dialyzed from the protein. 
Surprisingly, the purified renatured protein showed a lime- 
dependent shift on an SDS/PAGE gel (Fig. 6/1), with the lower 
fragment being the size of the mature, processed enzyme 
lacking the His tag. N-tcrminal sequencing of the purified, 
activated protease domain yielded the expected WGGT 
activation sequence. When the refolded protein was tested for 
activity by using the synthetic substrate Spectrozyme tPA. a 
time-dependent increase in activity was observed (Fig. I" 
contrast, the protease domain that contams the Ser*^Ala 
mutation showed neither a change in size on an SDS poly- 
acrylamide gel nor an increase in enzymatic activity under 
identical conditions (data not shown), suggesting that the 
catalytic serine is necessary for activation and is not the result 
of a contaminating protease. To show that the cleavage of the 
protease domain was a result of His-tagged MT-SPl protease 
activity, the inactive Ser**»^Ala protease domain was treated 
with purified recombinant enzyme (Fig. 6C). This treatment 
results in the formation of a cleavage product that corresponds 
to the size of the active protease (Fig. 6C, lane 7). Untreated 
protease domain does not get cleaved (Fig. 6C. lane 8). From 
these results, it is concluded that the protease autoactivates on 



105S Colloquium Paper: Takeuchi et al 



Froc. Natl. Acad. Sci USA 96 (1999) 



64 



65 



36 44 M ^* 

PROTEASE DOKMNS 16 2^ ™c»i.-iinrp bt krt twpf 

(„.-8S5> wocm«a>Ex; Evov^suiA u;qc. .Hxic as.:spkw.v s«v^™ cfkvsdpt^ l^^Z 

*"r !2S r,liUp.z.™c: VWPWOVSLKV HDRYVOWFCG CSLIHPCMVL TAACCVCPO. ■ Y??^^?^:" -rl »vm« l.TnTTVIHPH 



MT-SPl 
TRTfB.KUKAN 
0(TK_HU>iAN 
HEPS_HUMAN 
TRy2_HUKAN 

CONSENSUS 
2* STRUCTURE 



MT-SPl 
TRVB _KUMAN 
ENTKJtUMAN 

TRY2_KUMWS 
CTRB^HUMAN 
CONSENSUS 

2" STRUCTURE 



wrr-SPi 

THYBJJUMAN 

KEPS_HtJMAN 

CTRB_KUMAN 
COHSEHSUS 
2* STRUCTURE 



(31-275) IVCGOEAPRS W*^]^^}^ 

TVCCROTSLC BWPWSLR. YDG*. -Hl^ CSX4,SGWVL TXAHCF- 



(7Bi:i019> IVGCSKAKEC AWIWVCLY. ^-I^ Jf^S^ S^J^ - - 

YDGA, .HLCP CSX4.SGEWVI. TAAHCl^^-. ntfRLCElOnE WE.CMBOFI NAAKIIRHPK 



JUfLEPSKW TATLCLHMKS NLTSPOTy^ LlDEIVIWyH 
PSWKVLSKW KVFAGAVAOA . -SP.HCLOL GstQX^VYVCG 



(163-417) 



P4.247) IVGGYICEEN SVPYOVSL. . NSGY-.H^Ji cs»i-i:,w«vv ^a^;*--- ; VWACEFDOG SDEE.NIQVL KIAXVTKNPK 

llJ J^J: S^vp. sw^ajst-OD K^P..Hrg csuxs^;. . . .. . ywjc^ , . 



IVGG 



97 



j^p^^St ^— * — U — L£— II B5- 

118 128 138 148 '^57 16] 

|-S-(PRO> |S-(C201)- 



108 



risSf &;jssS'i:s±ss -^^^ issivis. 

HPQFYT. . .AQIGW51ALLE ^^-^^^^ ^S^^P^ ^^^^ ^^^^Q ^.rTANILO EAD\rPLUSNE RCOQQ-KPEV 

e-Sb vS^- -i^" ^=5^ 



F 

Y 

|~LC- 



S^ir ST^ ^ urrS S^-:" 

:i:Ti?!^itXH ,i,_ii.u>-— 1 1-08— II— u-ii 



178 



||-P<^I 

186 196 212 222 232 242 

^^^ss^^S ^ -Ssst c;^ 
I S^y.'-'. ^^^^ -^'^^ 

R ^'^^^•'^^StS ^plv G waaa-<3CAQ -HRPfivrrnv -f-wi — 



(B) LDL REPEATS 



1: 



HT-SPl-l 
Kr-SPl-2 

Kr-spi-3 

MT-SPl -4 
U>LRl 
LDLR2 
LDUO 
LDUt4 
LDLR5 



.CRCI.RK ELRCOCTWUX: TBHSDE- -U» CS 



(453-487) C.PGOFT^ ^^^'^-^J "^S^^ CDNSDE- -QG CS 

(488-524, CDAOTC^ "S^-dSSf. sSSS^ GDGSDE..AS CHCVNWT- 

^•Ss S- -i^S^ SSS^s.^ ^rs;;^:::" 

i-iEi --i i ssisr^ ss:?^?^?^^ 

(176-214) C.SAFCTHCL S.-GBCI-HS J^^Jy^^c, iOWSIIE. .VC CVNVTL 

LDU16 (215-254) C.RPDHJCS 0- -^SE^^^ X^^SdC RDWSDEPIXE CGTI*E 

U5LR7 (255-296) CEGPNKTXCH -GBCITLD X/v™ak^ ?5gsDE-— C — - 

SEHSUS C C- o-w* 



(525-566) 
(567-603) 
(6-46) 
(47-87) 



CONSB^SUS 



BMPl_)niMAN-l 
BMPl_HUHAN-2 
BMPl_HimAN-3 
ClR_iWKAN-l 
ClR^HUMAN-2 
CONSOISUS 
2' STRUCTURE 

KT-SPX-l 
Kr-SPl-2 
EiaTK_)WHAN-l 
EWrK„HUKAN-2 
BMPl_JfUKAN-l 
BMPl_>IUKAN-2 
BMPl_HUMAN-3 
CIR_HUMAN-1 
ClR_KU)iAN-2 
CONSQ^SUS 

2» STRUCTURE 



.„_. S-S I 

, c.c t -«v, *e CDERGSDLVTVY ITT^SfVBPtiKLWOL C 

i524-«., cecPmA. H^SJg™ 5^_^:S^0i2f^; »H^^.LDt 



(C) CUB DOHAINS 

KT-SPl-l (213-339) _ _ 

KT-SPl-2 (340-452) CGCTLJtKA • -"^'J^L^ ^STpSETS. WOQWIIKV NQCLSIKLSFDD.FNT YYTV- SOvEIR .DcHaDSLULAVY T 

ZJ)TK^HUMAN-2 



(322-434) CCETUjnS 
(435-546) 
(591-703) 
(18-141) 
(193-305) 



.CAY. 



DYVEVR 

• • -SSJSIS JSeYPPN.KNCIWOLVA PTOVRISLOFDF.FET BOIDSJ- ' - iSSSlS ADK KLCRP C 

NXGEF C 



2iE -.-lis ^:^?:::i:-c- 



FEATTFOLPR KSS 



FSNQ VTATFLI ISDESDYVO. .FWATVTAT 

c^^^ . . . . - V^tDSI^ LWVEFBS SSWWVKSKG. . . FTAWEM 

: .^siv. ~ 

°::p\^7i:::::Zm i i-os-i i— li—- 11--&10.1 



. . r. I io«™ n Bshnels o a-heliccs;S-S, disulfides. (>4) Multiple sequence 

lrund^"Kt'ren4f^.cM^^^ P™«"> > ''''' 

also immunoreactive with the inAb {Qiagen. Chauworth, CA) 



refolding. The activated protease was separated from inactive 
protein and other contaminants by using affinity chromatog- 
raphy with D-aminobcnzamidine resin. Purified protein was 
analyzed by using SDS/PAGE. and no other contarn.nanis 
were observed. Similarly, immunoblouing with polyclonal 
antiserum against purified protease domain (raised in rabbits 
at Berkeley Antibody, Richmond. CA) revealed one band. 
Under nonreducing conditions, the pro region is disuinde- 
linked to the protease domain; thus, this purified protein was 



directed against the N-ierminal Arg-Gly-Ser-His. epitope that 
is contained in the recombinant protease domain further 
indicating the purity and identity of the protem (data not 

"^Kinetic Properties of PuriHed His-MT-SPl Protease Do- 
main. The enzyme concentration was determined by using an 
active site titration with MUGB. The catalytic activity of the 
protease domain was monitored by using pNA substrates. 



Colloquium Paper Takcuchi tt ai 



— * 



9.49 kb 
7.46 kb 
4.40 kb 

2J7 kb 
1.^5 kb 





F»0 5 Tissue disiribution of MT-SPl mRNA levels. Northern 
blots of human poly{A)+ RNA from assorted human tissues was 
hybridized with radiolabeled cDNA probes as descnbcd in Mfl/erwZr 
and Methods, Upper shows hybridization by usmg a MT-SPl l.> 
kilobase cDNA fragment derived from expressed sequence lag done 
W39209 and exposed ovcmighi. Lower shows ihe same blot after bcmg 
stripped and rehybridi2cd with a loading "^^^^^"^ > 
human gWccraldchydc phosphate dehydrogenase (G A PDH) W 
cDNA probe exposed for 2 hours. The mobiUiy of RNA size standards 
is indicated at the left 

Purified protease domain was tested for hydrolytic activity 
against ictrapeptide substrates of the form Suc-AAFX-pNA, 
which contained various amino acids ai the PI position {Vi- 
Ala, Asp, Glu, Phe, Leu, Met, Lys, or Arg). The only stabsirates 
with detectable activity were those with Pl-Lys or Pl-Aig. The 
serine protease domain with the Sci^^AIa mutation had no 
detectable activity. The activity of the protease domam was 
further characterized by using the substrate Specirozyme tPA, 

yielding: = 31.4 ± 4.2 ^, /cc,. = 2.6 X^10 .-.^^.^';'"^, 
kcJK^^ 6.9 X 10* ± 23 X 10* M'^-s'^ Ecotm inhibition of 
the MT-SPl His-tagged protease domain fits a tight-binding 
reversible inhibitory model (21, 22) as observed for ecotm 
interaction with other serine protease ^a^S^^^i^;, ^V^* 
Inhibition assays by using ecotin and ecotin M84R/M8^ 
yielded apparent K, values of 782 ± 92 pM and 9.8 :r 1.5 pM, 
respectively. 

DISCUSSION 

Structural Motifs of MT-SPl. In this work, we characterize 
the expression of chymotrypsin-fold Proteases by PC-3 cells 
and cloned a member of this family we call MT-SPl. The narnc 
membrane-type serine protease 1 (MT-SPl) is given to be 
consistent with the nomenclature of membrane-type m^^ 
trix metalloproteases (MT-MMPs; ref. 32). The cDN A likely 
encodes a membrane-type protein because of the lack of a 
signal sequence and the presence of a putative SA that is also 
seen in other membrane-type serine proteases hepsm (25). 
enteropeptidase (26), and TMPRSS2 (32). and human airway 
irypsin-like protease (33). We propose that protems that are 
localized to the membrane through a SA and that encode a 
chymoirypsin fold serine protease domain be categorized in 
the MT-SP family. The membrane localization of MT-SPl is 
supported by immunofluorescence experiments that localize 
the protease domain to the extracellular cell surface (unpub- 
lished results). , ^ . 

Following the putative SA arc several domains that are 
thoufihl to be involved in protein-protein interactions or 
protcin-Iigand interactions. For example. CUB domains can 
mediate protein-protein interactions as with the seminal 
plasma PSP-l/PSP-Il heierodimcr that is built by CUB- 
domain interactions (28) and with procollagen C-proteinase 



Proc, Nati Acad. Sci. USA 96 (2999) 11059 

enhancer protein and procollagen C-proteinase (BMP-1) (34, 
35). Interestingly, most of the proteins that contain CUB 
domains are involved in developmental processes or are in- 
volved in proteolytic cascades (27). which suggests that MT- 
SPl may play a similar role. The four repeated motifs that 
follow the CUB domains are known as LDLR ligand-binding 
repeals, named after the seven copies of repeats found in the 
LDLR. There are several negatively charged amino acids 
between the fourth and sixth cysteines that are highly con- 
served in the LDLR and are also seen in the LDLR repeats of 
MT-SPl. The conserved motif Ser-Asp-Glu (residues 44-46 in 
Fig. 4) are known to be important for binding the positively 
charged residues of the LDLR ligands apolipoprotein B-lOO 
(ApoB-100) and ApoE (29). The ligand-binding repeals of 
MT-SPl most likely do not mediate interaction with ApoB-100 
or ApoE but may be involved in the interaction with other 
positively charged ligands. For example, LJDLR repeats in the 
LDLR-related protein have been implicated the binding and 
recycling of proiease-inhibiior complexes such as uPA- 
plasminogcn activator inhibitor-1 (PAM) complexes (re- 
viewed in refs, 36 and 37). It also has been shown that the pro 
domain of enteropeptidase is involved in interactions with its 
substrate trypsinogen. allowing 520-fold greater catalytic ef- 
ficiency in the cleavage compared with the protease domain 
alone (38). By analogy, similar interactions should occur 
between MT-SPl and its substrates. Thus, further investigation 
of MT-SPl CUB domain or LDLR repeal interactions may 
yield insight into the function of this protein. 

The amino acid sequence of the serine protease domain of 
MT-SPl is highly homologous to other proteases found in the 
family (Fig 4). The essential features of a functional serme 
protease are contained in the deduced amino acid sequence of 



(A) 



32.5 kD -•-f 
25 kD - = 



hours 0 1 2 3 4 5 6 



Autoacthmtion of MT-SPl 



(B) 




(C) 




32.5 kD - 
25 kD- 

minutes 0 15 30 45 60120+ - 



FiG. 6. Activation and purification of His-tagged MTSPl protease 
domain A rcprescniauvc experiment is shown in A and B. {A) 
tMnJl^C was mon itorcd by using SDS/PAGE. The upper band 
represents inaciivaied protease domain, and the lower band represents 
active protease (also verified by N-terminal sequcnang). (5) Jhe 
activation of the protein was monitored by V^'"S Spearozyrnc^as 
a synthetic substmc for the protease domain, i^)}'^^^^''^^^^^^^^ 
protease domain is cleaved with 10 nM activated His-taagC;d MT-SP 
protease domain ai 3rC The specific cleavage of active MT-SPl 
protease domain is required for proper processing at the activation 
site. Aaivc protease domain is shown in lane 7 (+). and no cleavage 
of the untreated inactive protease domain is observed (lane S. 



11060 Colloquium Paper: Takeuchi et al. 

the domain. The residues that comprise the catalytic triad, 
His-656, Asp-711, and Ser-805, corresponding to His-57. Asp- 
102, and Ser.195 in chymotrypsin. are observed jn MT-SPl (for 
reviews, see refs. 39 and 40). The sequence Ser^"Trp^'^GIy 
(Ser^^Trp^^Gly^^), which is thought to interact with the side 
chains of the subslraie for properly orienting the scissile bond 
is present. Gly-193 (Gly-803) and Gly-196 (Gly-805). which arc 
thought lo be necessary for proper orientation of Ser-195 
(Ser-805). also are present. Based on homology to chymotryp- 
sin three disulfide bonds are predicted to form within the 
protease domain at Cys-44-Cys-58. Cys-168-Cys-182, and 
Cys-191-Cys.220 (Cys-643-Cys-657, Cys-776-Cys-790 and 
Cys-801-C^-830), and a fourth disulfide bond should form 
between the catalytic and the pro-domain Cys-122-Cys-l 
(Cys-731-Cys-604), as observed for chymotrypsin. This pre- 
dicted disulfide with the pro domain suggests that the active 
catalytic domain should still be localized to the cell surface via 
a disulfide linkage. The presence of the catalytic machmery 
and other conserved structural components described above 
suggest that all features necessary for proteolytic activity are 
present in the encoded sequence. 

Substrate Specificity of the MT-SPl Protease Domain. The 
S \ site specificity (41) of a protease is largely determined by the 
amino acid residue at position 189. This position is occupied by 
an aspartate in MT-SPl, suggesting that the protease has 
specificity for Arg/Lys in the PI position. In addition, the 
nrcNcnce of a polar Gln-192 (Gln-803), as in trypsin, is 
ciinsisicni with basic specificity. Furthermore, the presence of 
GIv 21 6 (Gly-827) and Gly-226 (Gly-837) is consistent with the 
presence of a deep SI pocket, unlike clastasc, which has 
Va|.2I6 and Thr-226 that block the pocket and thereby con- 
uibuic to the PI specificity for small hydrophobic side chains. 
The specificity al the other subsites is largely dependent on the 
n;iiurc of the seven loops A-E and loops 2 and 3 (Fig. 4). Loop 
C in enierokinase has a number of positively charged residues 
ihai are thought to interact with the negatively charged 
activation site in trypsinogcn, Asp-Asp- Asp- Asp-Lys (26). One 
known substrate for MT-SPl (as described below) is the 
activation site of MT-SPl, which is Arg-Gln-Ala-Arg (residues 
61 1-614). Loop C contains two Asp residues that may partic- 
ipate in the recognition of the aaivation sequence. 

One means of obtaining further data on substrate specificity 
is by characterization of the activity of the recombinant 
proteolytic domain, Enierokinase has been characterized from 
both recombinant (38, 42) and native (43. 44) sources. How- 
ever, proteolytic activity for the other reported membrane- 
type serine proteases hepsin (25) and TMPRSS2 (32) are only 
predicted based on sequence homology. To produce active 
recombinant MT-SPl, a His-tagged fusion of the protease 
domain was cloned into an £. coli vector and expressed and 
purified to homogeneity. Fortuitously, the protease domain 
refolded and auioaciivated after resuspension and purification 
from inclusion bodies. This activity, coupled with the lack of 
activity in the Ser'^AIa (Ser^^^Ala) variant, demonstrates that 
the cDNA encodes a caialytically proficient protease. Auto- 
activation of the protease domain at the arginine-valine site 
(Arg^'^-Val*^^) shows that the protease has Arg/Lys specificity 
as predicted by the sequence homology to other proteases of 
basic specificity. Specificity and selectivity are confirmed by 
the lack of cleavage of AAPX-pNA substrates that do not have 
X = R, K. Further characterization with Specirozyme iPA 
revealed an active enzyme with /Ccai = 2.6 x 10^ s"*. However, 
the His-tagged serine protease domain does not cleave H-Arg- 
pNA, showing that, unlike trypsin, there is a requirement for 
additional subsiie occupation for catalytic activity. This sug- 
gests that the enzyme is involved in a regulatory role that 
requires selective processing of particular substrates rather 
than nonselective degradation. 

MT-SPl Function. In other studies, we have found that 
inhibition of serine protease activity by ecotin or ecotin 



Proc Nad. Acad. ScL USA 96 (1999) 

M84R/M85R inhibits testosterone-induced branching ductal 
morphogenesis and enhances apopiosis in a rat ventral pros- 
tate model (F. Elfman. T.T.. C.S.C. G. Cunha. and M.A.S,. 
unpublished results). Moreover, the rat homolog of MT-SPl is 
expressed in the normal rat ventral prostate (data not shown). 
Assays of the protease domain with ecotin and ecoim M84R/ 
M85R showed that the enzymatic activity is strongly inhibited 
(-782 92 pM and 9.8 ± 1.5 pM, respectively), suggestmg that 
rat MT-SPl is likely to be inhibited al the concentrations of 
these inhibitors used in our experiments. MT-SPl inhibition 
may result in the observed inhibition of differentiation and/or 
increased apoptosis. Future studies are aimed at definitively 
resolving the role of MT-SPl in prostate differentiation. The 
broad expression of MT-SPl in epithelial tissues is consistent 
with the possibility that it is involved in cell maintenance or 
growth, perhaps by activating growth factors or by processing 
prohormones, 

MT-SPl may participate in a proteolytic cascade that results 
in cell growth and/or differentiation. Another structurally 
similar membrane-type serine protease, enterppeptidase (Fig. 
3) is involved in a proteolytic cascade by which activation of 
trypsinogen leads to activation of downstream intestinal pro- 
leases (5). Enteropeptidase is expressed only in the enterocytes 
of the proximal small intestine, thus precisely restricting 
activation of trypsinogen. Thus, in contrast to secreted pro- 
leases that may diffuse throughout the organism, the mem- 
brane association of MT-SPl should also allow the proteolytic 
activity to be precisely localized, which may be important for 
proper physiological function; improper localization of the 
enzyme, or levels of downstream substrates could lead to 
disease. 

We have found subcutaneous coinjection of PC-3 cells with 
wild-type ecotin or ecotin M84R/M85R led to a decrease in 
the primary tumor size compared with animals in whom PC-3 
cells and saline were injected (O. Melnyk, T.T., C.S.C. and, 
M A.S., unpublished results). Because wild-type ecotin is a 
poor micromolar inhibitor of uPA, serine proteases other than 
uPA likely are involved in this primary tumor proliferation. 
Both wild-type ecotin and ecotin M84R/M85R are potent, 
subnanomolar inhibitors of MT-SPl. raising the possibility that 
MT-SPl plays an important role in progression of epithelial 
cancers expressing this protease. 

Direct biochemical isolation of the substrates may be pos- 
sible if MT-SPl adhesive domains such as the CUB domams or 
LDLR repeats interact with the substrates. In addition, likely 
substrates may be predicted and tested for by using knowledge 
of extended enzyme specificity. For example, the character- 
ization of the substrate specificity of granzyme B allowed the 
prediction and confirmation of substrates for this serine pro- 
tease (45). Thus, these complimentary studies should further 
shed light on the physiological function of this enzyme. 

We thank Marion Conn, Robert Macda. Todd Pray. Ibrahim 
Adiguzcl, and Ralph Reid for technical assistance and helpful discus- 
sions TT. was supporicd by a National Institutes of Health posidoc- 
loral fellowship CA71097. and this work was supported by National 
Institutes of Health Grant CA72006. 

1. Ncurath, H. & Walsh. K. A (1976) Froc. Natl Acad. ScL USA 73, 
3825-3832. 

2. Davie. E. W.. Fujikawa. K. & Kisicl, W. (1991) Biochemistry 30. 

10363-10370. , , , A< 

3 Chandler, W. L. (1996) Cnt. Rev. Oncol Hematol. 24, 27-45. 

4. Reid. K. B. M. & Poner, R. R. (19S1) Anna. Rev. Biochem. 50, 
433—464 

5. Hubcr, R. & Bode. W. (1978) Acc. Chem. Res. 114-12Z 

6. Wang. C..I.. Yang, O. & Craik. C. S. (1995) / BiOl Chern. 270. 

12250-12256. - . t. t ^ -i. 

7 Yang S. Q.. Wang. C.-l.. Gillmor, S. A, Rcttcnck. R. J. & Craik, 

C. S. U998) J. Mot. Biol 279. 945-957. 



Colloquium Paper: Takeuchi et ai 

8. Dano. K. Andreasen, P. A., Grondahl-Hansen. J., Kristcnscn P.. 
Nietecn, U S. & Skriver. U (1985) Adv. Cancer Res. 44. 139--266. 

9. Andrtascn, P. A.. Kjollcr, U, Chrisicnscn, U & Duffy, M. J. 
(1997) Int. J. Cancer 72. 1-22, , * i 

10. Kaighn. M. E., Narayan, K. S.. Ohnuki, Y., Lcchncr. J. F. & Jones, 
L. W. (1979) Invest. Uroi 17. 16-23. _ , „ „ - 

1 1 Yoshida, E., Vcrrusio, E. N., Mihara, H.. Oh. D. & Kwaan. H. C. 
(1994) Cflnccr/?ci 54. 3300-3304. c a 

12. Sakanari, J. A., Staunton. C E.. Eakin. A. E., C'^J*^' |- * 
McKcrrow, J. H. (1989) Proc Natl. Acad. Set. USA 86, 4863- 

13. wfJgand. U.. Coitach, S.. Minn, A., Kang, J. & MuUer-Hill, B. 
(1993) Gene 136, 167-175. 

14. Kang. J.. Wicgand. U. & MuUer-Hill, B. (1992) Gene 110. 

181—187 

15. Borson. N. D.. Salo. W. L. & Drcwcs. L. (1992) PCR Methods 
Appl 2, 144-348. . p 

16. Don, R. a. Cox. P. T. Wainwrighl. B. J.. Baker. K. & Mattick. 
J. S. (1991) Nucleic Acids Res, 19, 4008. 

17. Ausubcl, F. M.. Breni. R-. Kingston, R. E.. Moore D. D.. 
Seidman. J. G./Smilh. J. & Simhl. K.. cds. (1990) Current 
Prot<Kob in Molecuhr Biology (;Wl\cy,r^cwyoT\^). 

18. Evnin. U B., Vasquez. J. R. & Craik, C. S. (1990) Proc, Natl 
Acad. ScL USA 87. 6659-6663. 

19. Unal. A., Pray. T. R-. Lagunoff. M,. Pennington, M. W., Ganem, 

D. & Craik, C. S. (1997)7. Virol. 71, 7030-7038. 

20. Jameson, G. W.. Roberts, D. V.. Adams. R. W., Kyle. w. a. 
& Elmore. D. T. (1973) Biochem. J. 131. 107-117. 

21. Morrison. J. F. (1969) Biochim. Biophys. Acta 185, 269-280. 
2Z Williams. J. W. & Morrison, J. F. (1979) Methods EnzymoL 63, 

437-467 

23. Frohman. M. A. (1993) Methods Enzymol. 218, 340-356. 

24. Kozak. M. (1991) J. Cell Biol. 115, 887-903. 

25. Leylus. S. P.. Locb. K. R., Hagcn, F. Kurachi. K. & Davic, 

E. W. (1988) Biochemistry 27. 1067-1074. , _ 

26. Kiiamoto. Y.. Yuan. X. Wu. Q.. McCourl. D W & Sadler, J. E. 
(1994) Proc. Natl Acad. Sci. USA 91, 7588-7592 

27. Bork. P. & Beckmann. G. (1993) /. Mai Biol 23h 539-545 

28. Varcla. P. F.. Romero. A.. Sanz. L.. Romao J ' ^o^er- 
Petersen. E. & Calvete. J. J. (1997) J. Mol Biol 274 635-M9 

29. Kriegcr. M. & Herz, J. (1994) Annu. Rev. Bio^erf^ 63, 601-637. 

30. Seymour. J. L.. Undquist. R. ^-^^J^""'^: ^^^^^ ^f^^^ 
Yansura, D.. Reilly, D.. Wessinger. M. E. & Lazarus, R. A. 
Biochemistry 33, 3949-3958. 

31. Nagasc. H. (1997) Biol Chem. 378, 151-160. 



Froc. Nail Acad. Sci USA 96 (1999) 11061 

32. PoJoni-Giacobino. A.. Chen, H.. Peilsch, M. C. Rossicr. C & 
Anlonarkis. S. E. (1997) Genomics 44, 309-320. 

33. Yamakoka. K., Masuda. K.. Ogawa, H.. Takagi. K.. Umcmoto, N. 
& Yasuoka. S. (1998)/ Biol Chem. 273, 11895-U901. 

34 Kcssler. E. & Adar. R. (1989) Eur. J. Biochern. 186, 115-121. 

35. Hulmcs, D. J. S.. Mould, A. P. & KessJer. E. (1997) Matrix Biol 
16 41-45 

36. SiHckl, D.' K., Kounnas, M. Z. & Argravcs, W. S. (1995) FASEB 
J 9 890—898 

37 Mcistrup, S. K, (1994) Biochim. Biopys. Acta 1197, 197-213. 

38. Lu. D.. Yuan, X.. Zheng, X. & Sadler. J. E. (1997)7. Biol Chem. 
272, 31293-31300. . 

39. Pcrona. J. J. & Craik, C. S. (1995) Protein Set. A, 337-360 

40. Pcrona, J. J. & Craik. C S. (1997) /. Biol Chem, 272, 29987- 
29990 

41. Schcctcr. 1. & Berger. A. (1967) Biochem. Biophys. Res, Commun. 
27, 157-J62, , ^ . ^ ^ 

42. LaVallie. E. K„ Rchmiulla. A.. Rac»c. U A.. DiBIasio. E. A.. 
Fcrenz. C. Grant. K. L.. Ught, A. & McCoy. J. M. (1993) i. Biol 
Chem, 268, 23311-23317. 

43. Ught. A. & Fonseca. P (1984)7. Biol Chem 259^^^13195-13198. 

44 Maisusbima, M., Ichinosc. M.. Yahagi, N.. Kakei. N.. Tsukada, S.. 
Miki K. Kurokawa. K.. Tashiro. K.. Shiokawa. K., Shmomiya. 
K.. et at (1994) / Biol Chem. 269. 19976-19982 

45 Harris J. L, Peterson, E. P.. Hudig. D.. Jhombcrry. N. A. & 
■ Craik. C S. (1998) /. Biol Chem, 273, 27364-27373 

46. Nevins. J. R. (1983) Annu, Rev. Biochem. ^2, 441-466 

47. Kitamoto. Y.. Veilc. R. /l, Donis-KcHcr, H. & Sadler. J. E. 
(1995) ^/orAcmir/ry' 34. 4562-4568. ^ . 

48. Bcckjnann, G. & Boik. P. (1993) Trends Biochem. Scu IS 4Q~Al. 

49. Emi. M.. Nakamura. Y.. Ogawa, M.. Yamamoto, T Nishidc, T.. 
Mori. T. & Matsubara K (1986) Gene 41. 305-310 

50. Vandcrslicc. P.. BalHngcr, S. M., Tarn. E. K. Goldstein, S M 
Craik, C. S. & Caugbcy. G. H. (1990) Proc. Natl Acad. Set. USA 
87 3811 3815 

51 Tomiia.R.lzumoto.Y..Horii.A..Doi.S,,Yokoij(:hi.H. Ogawa. 
M., Mori, T. & Matsubara. K. (1989) Biochem. Biophys. Res. 

Commun. 158, 569-575. v*c*t> ht^u/ 

52. Sudhof. T. C, Goldstein. J. L.. Brown. M. S. & Russell. D. W. 

(1985) Science 228, 815-822. whi.t.rc 
53 Wozncv J M.. Rosen. V., Celeste, A. J.. Miisock, L. M., Whiiters, 

M J. K^iz. R. W.. Hawick. R. M. & Wang. E. A. (1988) Science 
242 1528—1534 

54. Leytus, S. P.. Kurachi, K., Sakariasscn. K. S. & Davie. E. W. 

(1986) Biochemistry 25, 4855-4863. 





Exhibit 4 





United States Patent 

O'Brien et al. 



[19] 



US005972616A 

[11] Patent Number: 
[45] Date of Patent: 



5,972,616 
Oct. 26, 1999 



[54] TADG-15: AN EXTRACELLULAR SERINE 
PROTEASE OVEREXPRESSED IN BREAST 
AND OVARIAN CARCINOMAS 

[75] Inventors: Timothy J. O'Brien; Hirotoshi 

Tanimoto, both of Little Rock, Ark. 

[73] Assignee: The Board of Trustees of the 

University of Arl^nsas, Little Rock, 
Ark. 

[21] Appl. No.: 09/027,337 
[22] Filed: Feb. 20, 1998 

[51] Int. CI.* C12Q 1/68 

[52] U.S. CI 435/6; 435/320.1; 435/69.1; 

536/23.1; 536/23.5; 530/350 

[58] Field of Search 536/23.1, 23.5; 

530/350; 435/320.1, 6, 71.2, 69.1, 41, 71.1 



Primary Examiner — Sheela Huff 

Attorney, Agent, or Firm — Benjanain Aaron Adler 

[57] ABSTRACT 

The present invention provides a DNA encoding a TADG-15 
protein selected from the group consisting of: (a) isolated 
DNA which encodes a TADG-15 protein; (b) isolated DNA 
which hybridizes to isolated DNA of (a) above and which 
encodes a TADG-15 protein; and (c) isolated DNA differing 
from the isolated DNAs of (a) and (b) above in codon 
sequence due to the degeneracy of the genetic code, and 
which encodes a TADG-15 protein. Also provided is a vector 
capable of expressing the DNA of the present invention 
adapted for expression in a recombinant cell and regulatory 
elements necessary for expression of the DNA in the cell. 

11 Claims, 17 Drawing Sheets 



U.S. Patent 



Oct 26, 1999 



Sheet 1 of 17 



5,972,616 




FIG. 1 



U.S. Patent 



Oct. 26, 1999 



Sheet 2 of 17 



5,972,616 



o o a: 

o < oc 

OC CO Q 

CO c ' 

< cn O 
0 0^:1 

< Q t-t 
> Q 



< h:j CO 
O O O 

< J 

s 

O CiD 

CO E-H :2: 

a. 2 



> 



3C 

> 



> Q 

a; in 

• o 

. Q 

On hH 







CO 


CO 


< 








M-i 


1— 1 


Uj 




LI/ 


f n 










* 


• 


Q 


CO 


• 


• 


1X4 


>-« 




< 


>• 


S 




S 


S 




s 






CO 


O 




o o 


H 


Q 




Eh 


Crj 


CO 


CO 


CO 


CO 


CO 


CO 


CO 




CO 


o 




Of 










J* 
<C 


ULi 




►> 








\> 


1 — J 




Q 


X 


> 


o 










CD 


CO 






• — 1 


►> 




\ — ( 


1 — \ 


> 


CO 


CO 


> 


>- 




Cb 


CO 








CD 


CD 


T n 
CI? 




CD 


CD 


CD 


CO 




CO 




>- 




CO 


CO 




LJ 


J* 


CJ 




C>* 


c> 




CH 


i> 


O 




>-* 


o o 






PC 




CO 


t—i 


C-> 


t — I 


( — 1 


T 
1— 1 


•— 1 






Q 


Q 












E-" 




3u 


cc 


r 

U-t 


r . 


o 


E-H 




E~< 


3C 






* 


















hH 


r n 


CD 


IS 


IS 




CO 


to 


CO 




O 


o 




CD 


CD 


CD 


CD 


Cu 


Oh 








E-* 


PC 


CD 




o 




s 












?H 


t-M 










w 


f n 




< 










o 


o 


O 


CD 


U 


H 


Q 






Q 




a 




> 


> 








CO 


CO 


E-^ 


CO 


CO 


oc: 


< 






;^ 


>^ 


* 


> 


> 


< 


> 


> 


> 


> 






> 




CO 


to 




o 


a: 


* 


o 


> 














< 


►J 


to 




> 






• 


E- 


« 


Q 






CJ 




U 


o 


U 


CO 


U 


CO 


CO 






• 


< 


* 




CO 






t— 1 


< 




:^ 




Cxj 


CO 


Q 


CO 












CO 












E-» 






PC 




CO 


• 






• 


* 










* 


O 






CD 


CD 


> 




U 


* 


U 


CJ 


C-> 


■ 




>^ 


> 




* 


a 


< 


Ola 




< 




Q 


> 


• 


> 


> 


> 


* 


> 










> 




On 


< 


Oi 


< 










> 




* 






2: 


a: 






* 


■ 


Clu 








OmI 




Oi 




ex. 










m 


< 


> 




* 


Q 


E-H 




CD 


CD 


CD 


CD 


CD 


a 


CD 








Oi 


o 


a: 


CO 




a 


o: 






CD 


CD 


CD 


CD 


CD 


o 








Oi 


o 


CO 


u 


Oi 




CO 


Q 


-K CO 


CO 


CO 


CO 


CO 


CO 


CO 






IKi 


Ci^ 


< 


< 




< 


< 


CO 


< 


Q 


Q 


Q 


Q 


Q 


Q 


a 






a; 












Cj-i 


Oi 


CD 


CD 


CD 


CD 


CD 


CD 


CD 








CO 












H 


CU 


o 


O 


IS 


O 


S 


:^ 


o 






Q 


o 












d: 




o 


U 


O 


O 


u 


u 


u 



Ci-i >H 

u u u 
< 

CO 

> 
2 



E-« 

> 



< 

> 

s 



CO t-t 

> > 

> > 

O Q 



Cu Cu 

u u 

X X 

< < 

< < 

CO CO 



C:? cu CO CO CO Eh CO 

CO CO 2 Z CO 2: CO 

HH > t-H »— t hH l-H 

J J 1-3 J 

CO CO > CO CO Eh ^-^ 

CD <t: CD CD CD CD CD 

CD O CD CD CD CD CD 

U CJ U U U U U 

t-H • • Du H-^ t-^ 

XXX • X O Cm 

< CD Cu Ci^ < Ot; 

• O O X CD 'CO 

CD CD >^ H CD CD 

Q )-i LD CD i^: :z: CU 

>H < CO CO Q > CO 

cc: X 1-1 2 o cc 

CC 

X 

bci 

< 

Cm 

^ yA >A ^ 

CO CO < CO CO t-P 

>>>>>> 

O O O 0» O O O 

s s s >H s s :s 

Cu CXi cu cu Cu 04 (Xi 

S S X > S O X 

cc; CO CO CO CO CO CO 

CD CD CD S CD (3 < 

CO pc: CO a. n 

CO Q < CO > Cl4 Q 

H <: o u < u < 

a Q CL. 2: Q > Cu 

pt: e-i < >-* CO t< ►J 

Ci) CD CD CD CD CD CD 

CD CD Q CD ;2: CD e? 

> ^ M > > > 

h-i > M M HH hH t-H 

CC ^ a: a: o:: 



ui cn Q> 



Q) fd u 

ti: E-< CO Eh O 



^ o fd 



Cu cu. 

iA yA 

u o 

> HH 

CU cu 

CO CO 



CO E-H 

Pu cu 

^^^^ 

oc CO 

> t— I 

;^ eh 

1^ CO 

> > 
s PC 

CO < 



CO CO u 

Cu cu > 

CJ O PC 

> H^J > 

Cli 

> 
> 

X 



CO 

> 

E-< 



> 

CO 
CO 
CO 



O Q O 





CO 


CO 


2: 


CO 


E-* 


< 




>H 


ha 


M 


Cu 




u 


Cu 


CO 


oc 


> 


PC 




oc 


►J 


< 


< 




< 




CO 


CU 


cu 


o 


DC 


cu 


CU 


CO 














CO 




CO 


CO 




O 


CO 


CO 


CO 




CO 


< 


X 






hO 




hJ 




h:i 




X 


CO 




ii: 




PC 


o» 


> 




> 
























< 


<: 








< 


< 


M 






HH 








Q 


a 




Q 


a 


Q 


Q 




>H 






2: 


X 




CO 


• 


> 


2: 






• 




« 


X 




> 


H 


• 


CO 


• 


• 


» 






• 


CO 


Q 


• 




* 


CD 


Q 


CO 


Cu 


• 


• 




CU 


>H 




EH 


Eh 


H 




> 


tH 


cu 




O 


^C 




>H 


Q 


Q 


Q 


EH 


OC 


1— 1 




Q 


oc 


:s 


CO 


a 


CO 


CO 


ca 


Cu 


Cu 




>H 


Cu 


• 


Cu 


CU 


Cu 


CD 


O 




• 


CO 




CU 


CU 


CU 


cu 


■ 


1^ 


>H 


X 


X 


X 




• 


X 


CD 


CO 


PC 


DC 




CU 


> 


CD 


HH 




M 


Cu 


HH 


hH 


X 


1— < 


CO 


M 


> 


t-H 


>H 




oc 








> 








CO 




< 


0» CO 






< 








> 




DC 










CO 




DC 


M 


M 


yA 




Cli 


> 


CO 


DC 


Cu 


> 


oc 


t< 


CD 


O 


O 


O 


O 


CO 


O 




> 




CO 


n 


O CO 



< CO #< CO CO CO 
Q Q 2; Q CO O 

hH ^ ^(C fcC iiC 



Q 
X 



oo ^ 'd' in 00 



S S Q S S S Q 

s a s s s s s 

Q Q Q p p Q Q 

H H H H H H H 

















O O O O Of 


o» 


o 
















CO 


CO 




CO 


CO 


CO 


CO 














< 

O 


CO 


CO 




CO 


CO 


CO 


CO 














Cu 




























CD 
















CD 


CD 




CD 


CD 


CO 


ID 


t 














CD 


CD 


CO 


CD 


CO 


CD 


CO 


yA 










cu 




CO 


CO 


Q 


CO 


< 


Q 


DC 


o 










< 




Cu 


yA 






* 


CO 


Eh 












DC 




>H 


Cu 


HH 


Cu 


» 


>H 


Q 


> 










yA 




CD 


CD 


CD 


CD 


CD 


CD 


CD 
















< 


> 


<: 


> 


< 


< 


< 


CD 










> 




O 




u 


CJ 


U 


U 


U 


CO 










a 




Cu 






Cu 


HH 


Cu 


yA 


< 










cx, 














s: 




CO 




oc 






DC 






PC 


CO 




> 






CO 




CO 




CU 




cu 


cu 




CO 


Q 


CO 


Q 


X 




X 






CO 






H 


CO 


H 


H 


Eh 


Eh 




> 


1^ 


< 


< 


CO 


cu 


1 ""H 


HH 




HH 


HH 


1— 1 


> 


tit; 


CD 




< 


< 


o: 


PC 


o 


o 


yA 




PC 




• 


HH 






HH 


1-:) 
















Cu 




< 


2: 


H 


EH 


HH 




2: 












CO 


H 


o 


CO 


Q 


§ 






Q 




o 


Q 


CD 


DC 




DC 


Cu 








o 


o 


PC 


CD 


cu 


^< 


CU 


CD 


CD 




HH 


HH 




HH 


> 




HH 




vA 


>H 






> 






:s 










S 


Cu 




• 


• 


• 






CO 


Q 






cu 


CO 


Q 


Q 




• 


• 


• 


OC 


X 


DC 


DC 


Eh 


> 


HH 


HH 




< 


• 


> 


CO 


CO 


CO 


o 


Cu 


Cu 


Cu 




yA 




>H 


CD 






< 




o 


CO 


Q 


h^ 


id, 


2: 




o 






CO* 


Eh 


CO 




o 


H 


CO 


CU 


U 






CO 


H 




CJ 


CJ 


u 


u 




O 


> 


t-:i 


> 


> 


> 




> 


> 


H 


Q 




CO 


u 


DC 




PC 


o 




PC 






Q 


Eh 


O < 


< 


Q 


CO 


tH 


H 




Eh 


< 


tH 


Eh 




O CU 




O CO 


>H 


>H 


>H 


>H 


>* 






CO 




CO 


CO 


CO 


tH 


CU 


> 


> 


> 


> 


> 


> 


> 


hH 


M 


HH 


►o 






>H 


CD 


CD 


o 


CD 


CD 


CD 


CD 


1— 1 


> 


yA 


> 




h3 




CU 


CU 


cu 


CU 


CU 


Cu 


CU 


CU 


DC 




cu 


cu 


DC 








Q 




CO 


X 


> 


> 


n 


> 


< 


yA 


CU 








2; 




CO 


CD 


Q 


DC 


CO 


Q 


Q 




> 


X 


< 


PC 


cu 


^c 


H 


> 


^C 




CD 


> 


yA 






< 




o 


o 


o 


• 


Eh 


o 






CJ 


U 


o 




CO 






CD 




CO 


< 


CD 


O 


a 2 a a > « 


U 


U U 


u u 


u 


U 






H^l 


yA 


1^ 






CD 


CD 




CD 


Eh 


CD 


CD 


> 


HH 


Q 


CO 




yA 


DC 


* 




Cu 


• 


Q 






CD 




CO 


Q 


Q 


CO 


CO 


H 


Q 


EH 


o 


00 


ci 


• 



CM- 

C9 



U.S. Patent 



Oct 26, 1999 



Sheet 3 of 17 



5,972,616 



fi-tubultn — ^ 




U.S. Patent oct.26,1999 sheet 4 of 17 5,972,616 



o 



8 



O 



8 



0 



o 



O 
O 



O 



LMP Carcinoma 



3 1.5 



IT) 



1 



© 

o 



0.5 



Pi 



0 



Normal 



FIG. 4 



U.S. Patent 



Oct. 26, 1999 



Sheet 5 of 17 



5,972,616 



CO 

> 

O 



CO 
CM 




CO 
CM 



(QQl 




i 

< 




m 

CO 




i 



< 





tubulin 





TADG15 





FIG. 5 



U.S. Patent Oct. 26, 1999 sheet 6 of 17 5,972,616 



B-tubuiin — ► 

TADG15— ► 




FIG. 6 




U.S. Patent 



Oct. 26, 1999 




Sheet 7 of 17 



5,972,616 



o 
c 

u 
o 

u 

o 



TADG15 
3.15 kb 



o 

E 
o 

c 

o 

u 

O 

c 

r> 



o 
E 

8 

u 

o 
o 

'5 

ft) 

E 
o 

Ui 



FETAL 



ADULT 



O 

S 
o 

o 

o 
o 

xa 
O 




5k 



12 ^ 

3 M 



■V) 

O w 



x: j5~ fi> 

Cb. 



> 

o 



o 
o 



o 
o 

flu 



T CO 



c 

O- _j I 



o 



So 



(J-1ubulin — 






♦ P.B. : Peffpherol 8k»od 



B 



FIG. 7 



U.S. Patent 



Oct. 26, 1999 



Sheet 8 of 17 



5,972,616 



L 



00 

in 




oo 



c 



m 

0^ 



Cm 

» vM 

Sin 

O 



< 

e 

o 

U 



CN 

m 



03 

c 
o 



CO 

d 



ON 



0 



IT) 



U.S. Patent 



Oct. 26, 1999 



Sheet 9 of 17 



5,972,616 



O 

8 < 



o 
o 

Eh 



O CP 
CJ 

o 



a 

a 
a 



o 

Eh 
U 



8 



p 



n 




a> 



U.S. Patent 



Oct. 26, 1999 



Sheet 10 of 17 



5,972,616 



U 
O 

O ft 



CO 



CO 



o 
u 

u 

o 

Eh 



8 



P4 



o o 
o 

Eh 

a 

a 
a 

a 
o 

g< 

Eh 
Eh 

a 



o 
u 



6 

Eh 

i 

o 
u 




O 
Eh 
Eh 

EH 
O 

a 

Eh 

o 
u 



Eh 
O 
Eh 

S2 

o w 

Eh 
O 

U CM 



a 



6 



Eh 
U 



Ol 



U 
Eh 

Eh 



Eh 

a 



Eh 

Eh 
O 

<: 

O Q 

o 
o 

Eh O 
O 

O Q 
CD Q 



Eh 

o 



8 



Q 



Q 



00 



in 

C30 



8 

Eh 



CO 



CM 
I 



< rr 



CO 



U.S. Patent 



Oct. 26, 1999 



Sheet 11 of 17 



5,972,616 



o 



o 

U 

U CM 

a 
o 

o <: 
o 

o 
o 

O Ot 

o 
o 



6 



O Q 

o 

O 
O 

a 
o 

a 
u 



o 

o 



CN 

o 

CN 




U Eh Eh 
O Eh E-t 

o a 



o 

CO 

o 



tsl 

S 



II 



Q 

M 

^^^^ 

E-*CO 



^2 



O 

CO 

Q 

M 

O 



w 

o 
o 



" r-i 



U.S. Patent Oct. 26, 1999 sheet 12 of 17 



5,972,616 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



MGSDRARKGG GGPKDFGAGL KYNSRHEKVN GLEEGVEFLP VNNVKKVEKH 



GPGHWVVLAA VLIGLLLVLL GIGFLVW HLQ YRDVRVQKVF NGYMRITNEN 

KDALKLLYSG VPFLGPYHKE SAVTAFSEGS 



FVPAYENS tNS Tt CFVSLASKV 
VIAYYWSEFS IPQHLVEEAE 



RVMAEERVVM LPPRARSLKS FVVTSVVAFP 



TDSKTVQRTQ DNSCSFGLHA RGVELMRFTT PGFPDSPYPA HARCQWALRG 



DADSVLSLTF RSFDLASCDE 

y^ltJthssqn VLLITLITNT 

FNSPYYPGHY PPNIDCTWNI 

YVEINGEKYC GERSQFVVTS 



RGSDLVTVYN 
ERRHPGFEAT 
EVPNNQHVKV 
NSNKITVRFH 



TLSPMEPHAL VQLCGTYPPS 

FFQLPRMSSC GGRLRKAQGT 

* 

SFKFFYLLEP GVPAGTCPKD 



SDQSYTDTGF LAEYLSY 



DSS 



DP 



PGQFTCR TGRCIRKELR CDGWADCTDH (sDeJlNCSCDA GHQFTCKNKF 
CKPLFWVCDS VNDCGDN^DE] QGCSCPAQTF RCSNGKCLSK SQQCNGKDDC 
GDC feDEh SCP KVNVVTCTKH TYRCLNGLCL SKGNPECDGK EDCSDG{sDe}< 



DC 



DCGLRSFT RQAI^VGGTD ADEGEWPWQV SLHALGQGHI CGASLISPNW 
LVSA/^YID DRGFRYSDPT QWTAFLGLHD QSQRSAPGVQ ERRLKRIISH 
PFFNDFTFDY ©LALLELEKP AEYSSMVRPI CLPDASHVFP AGKAIWVTGW 
GHTQYGGTGA LILQKGEIRV INQTTCENLL PQQITPRMMC VGFLSGGVDS 
CQGC@ GGPLS SVEADGRIFQ AGVVSWGDGC AQRNKPGVYT RLPLFRDWIK 



ENTGV 



(SEQ. IDNO: 2) 



: Conserved cysteine residue 



NXTt; Possible N-linked glycosylation site 



SDEl : Conserved SDE motif 



Q : Potential cleavage site 

: Conserved amino acids of catalytic triad H, D, S 



1 

2 



1. Cytoplasmic domain 

2. Transmembrane domain 

3. CUB repeat 

4. Ligand-binding repeat (class A motif) 
of LDL receptor like domain 

5. Serine protease 



FIG. 10 



U.S. Patent 



Oct. 26, 1999 



Sheet 13 of 17 



5,972,616 



00 



o C3 



s 

o 



9 P^H^i 

s s 

o S 



u 

u 

• • • I 

^ m 



.52 



S 
o 



o 

s 



0^ 



I 



CM 



S 

OX) 

a; 



ON 
I 



U.S. Patent 



Oct. 26, 1999 



Sheet 14 of 17 



5,972,616 



CD 



GO 



O 

u 

H 
CJ 

CD 

< 
u 
u 
c:) 

u 
u 

C3 
< 

e? 
c:) 

CJ 

o 

< 
a 

C5 
E-< 

u 

C9 

c:) 

e? 
c:? 



< 




C3 


u 




< 






C5 






u 


* • 




< 






u 


r-l 




CD 


ID 




C5 


a 




O 


< 




CJ 




CO 


CJ 






fr* 






CJ 






CJ 






< 












o 






CJ 












00 



00 
OJ 

CJ 

u 

iD 

O 
O 
CD 
E-^ 
O 

o 
< 
o 

CD 
CD 

a 
u 
< 

CD 
< 
CJ 
CD 

< 

CJ 
CD 
CD 
Eh 
CD 

CD 
CD 

CJ 
CJ 
E-» 
Eh 

U 
CD 
CD 
CJ 
Eh 
< 
CD 
O 
CD 
CD 
H 
U 
CD 
H 
H 
O 
Eh 

CD 
CD 
H 

Eh 

u 

E- 

CJ 

o 

H 

O 

o 

CD 
CD 
U 
E-4 
< 
CD 
H 
O 
CD 
t-t 
CD 
U 
U 
CD 
< 
U 
CD 
CD 
H 
CJ 
CD 
E-i 

O 
CD 
Eh 

CD 
O 
CD 

CJ 
CD 
U 

<Ni 
00 



O 
O 



u 
u 

CD 

CD 
U 
CD 
Eh 
CD 
U 
< 
CD 
CD 
CD 
CJ 
CJ 
< 

CD 
< 
O 
CD 
E-« 
H 
Eh 

< 
CJ 
CD 
CD 
H 
CD 
^ 

CD 
CD 
Eh 
U 
CJ 

Eh 

u 

CD 
CD 
U 

Eh 
< 

CD 
CD 
CD 
CD 

Eh 

CJ 
CD 

CJ 
Eh 

CD 
CD 
Eh 

U 

a 
u 

Eh 
CJ 
CJ 
CD 
CD 
U 
Eh 

< 
CD 
E-" 
CJ 
CD 
E-» 
CD 
CJ 
CJ 
CD 
< 

u 

CD 
CD 
t-t 
U 
CD 

CD 
CD 

Eh 

CD 
CD 
CD 
Eh 
CJ 
CD 
O 



CO 

ro 

CD 
CD 

U 
CD 

u 

CD 
CD 
Eh 

CJ 

o 

CD 

CD 

H 
Eh 

H 
CD 
< 
CD 
H 
CJ 
< 
CJ 
U 
Eh 

U 

u 
u 

Eh 
U 



CD 

CD 
CJ 
< 

U 
CJ 
CD 
H 
< 
CD 
CD 
fr> 
CD 
E-< 
E- 
t-« 
Eh 

^ 

CD 
< 
O 



CJ 
Eh 
< 

CD 
CD 
< 
CD 

< 

< 
Eh 
CJ 
CD 
CD 

Eh 

E-« 

Eh 
CJ 
Eh 
CD 
CD 

CD 
< 

00 
CM 



O 

o 

Eh 

CD 
CD 

O 
CD 
< 

a 
u 

CD 

a 

Eh 

u 
o 

CD 

Eh 
CD 

EH 

E-^ 

CD 
< 
CD 
H 
CJ 
< 
CJ 

u 

Eh 

o 

CJ 
U 

Eh 

o 

CD 
< 
CD 

a 
< 

H 

CJ 

u 

CD 

en 
< 

CD 
CD 
E-« 
CD 
^ 
H 

Eh 

H 

CD 
< 
CD 
H 



< 
CJ 

< 
CD 
CD 
< 
CD 
Eh 
< 
U 
< 

Eh 

O 
CD 
O 

CJ 

Eh 

U 

CD 
CD 

IS 

CD 
< 









_1 
^1 




W 1 


00 




o 


OU 










ro 


to 




ro 


CJ 




CJ 


Eh 




Eh 


CJ 




CJ 


CJ 


— _ 


CJ 


CD 




CD 


\D 




CD 


CJ 




CJ 


CJ 




CJ 


Eh 




Eh 


CD 




CD 








CJ 




CJ 


CJ 




CJ 


CD 




r n 

CD 


Eh 




Eh 


CD 




F f\ 

CD 


CD 




CD 


t n 
CD 




CD 


CJ 




CJ 


CJ 




CJ 


CD 




CD 


CD 




CD 














CJ 




CJ 


CJ 




CJ 






r n 


r \ 






cy 




LI/ 


r \ 
L-/ 






CD 




CD 


CJ 




CJ 








cy 




r n 

cy 


CD 




CD 


fcH 




CH 


CJ 




CJ 


CJ 






CD 




CD 


CD 




CD 








fc-* 




fcH 


CJ 




CJ 






— *• 
«. 


Eh 




fcH 


CJ 




CJ 


Eh 


— - 


Eh 


fcH 




Eh 


CJ 




CJ 


CD 




CD 


CJ 




CJ 


j« 
«« 






CD 




CD 


r . 
fcH 




fcH 


CD 




CD 


CD 




F ft 

CD 


CJ 


- 


CJ 


o 




F % 


< 






CD 




CD 


CD 




CD 


CJ 


^^^^ 


CJ 


Eh 




Eh 


CD 




CD 


CD 




CD 






< 


Eh 





Eh 


CD 




CD 


CJ 




CJ 


CD 




CD 


CD 




CD 


*< 


^^^^ 




CD 




CD 


CD 






CJ 




O 


CJ 




F \ 

CJ 


Eh 




Eh 


CJ 




CJ 


CD 




CD 


CD 




F ft 

CD 


< 


- '- ~ 


< 


F n 

CD 




F ft 

CD 


CD 





CD 








O 




CD 


< 












CJ 




CJ 








Eh 






CJ 




o 


CD 




CD 


< 




< 


CJ 




CJ 


CJ 





CJ 


CD 


t 


CD 


CJ 




CJ 


CJ 




CJ 


< 




< 


CD 




CD 


Eh 




Eh 


< 






O 




CJ 


CD 




CD 


CJ 




CJ 


CJ 




O 


O 




CJ 


O 




CJ 


U 





CJ 


CD 




CD 


CD 




CD 


CD 




CD 


CD 




CD 


< 




< 


CD 




CD 


O 




CD 


Eh 


— 


£h 


CD 




CD 


U 




O 


< 





< 


O 


— 


o 


CD 




CD 








CD 




Eh 


t-t 




Eh 


Eh 




Eh 


< 




< 


CD 




F ft 

CD 


O 




o 


CD 




CD 


CJ 




CJ 


Eh 






u 




CJ 


O 




CJ 


Eh 





Eh 


CJ 




CJ 


CD 




CD 


< 




< 


< 




< 


CJ 




u 


CD 




CD 


CD 




CD 


CD 


_ 


CD 


< 


" 


< 


CJ 




CJ 


CJ 




CJ 


CD 


— * 


CD 


CD 




CD 


K 




<: 


CJ 


^^^^ 


CJ 


CJ 




CJ 


CJ 




CJ 






<; 


CJ 




CJ 


Eh 




Ch 


£-« 


^^^^ 


Eh 


CD 




CD 








Eh 




Eh 


CJ 




CJ 


CJ 


" 


CJ 


F n 

CD 


' - 


CD 


o 




CD 


< 






Eh 




Eh 


CJ 




CJ 


CJ 


. . 


o 


Eh 




Eh 


CD 





CD 


H 


— 


H 








CD 




F ft 

CD 












«; 


CD 




CD 


CD 






Eh 






H 




H 


O 




CJ 


O 




CJ 


CD 




o 


H 




H 


U 




o 


CD 




CD 


O 




CD 


CD 




CD 


U 




CJ 






Eh 


< 




< 


CJ 




CJ 


o 




CD 


< 




< 


CD 




CD 


Eh 




EH 








CJ 




U 


% 




% 


< 






CD 




CD 


H 




E-» 


rg 




rH 


eg 






00 




O 


00 




o 


ro 




CN 






ro 



00 
VD 

CJ 
CD 
CJ 
< 
CJ 
CD 
H 
O 
O 
CD 
CD 

tH 

Eh 

U 
CD 
< 
U 
CD 

Eh 

CJ 
CD 
< 
CJ 

a 
< 

CD 
CD 

<: 

CJ 
CJ 
CJ 

<: 

CD 
CD 
< 
CD 
< 
CJ 
< 

CD 
< 
CJ 



U 
CJ 
H 
U 

< 

CD 
CD 
CJ 
< 
CJ 
CJ 

u 
u 
t-t 

Eh 
H 

a 

CD 
CD 

CD 
CD 

U 
CJ 

o 
< 
u 

Eh 

CD 
CD 
Eh 

CD 
Eh 
Eh 

H 
CJ 

U 
EH 

CD 

O 
CJ 

u 
u 

CM 
00 



00 



CD 

o 
< 

CJ 
CD 
Eh 
CJ 
O 
CD 

CD 
Eh 

E-t 
^ 
O 
CD 
< 
U 
CD 
H 
U 
CD 
< 

< 
CD 
CD 
< 
CJ 
CJ 

u 

CD 
CD 
< 
CD 
< 
CJ 
< 
Eh 

CD 



— U 

— U 

— H 

— U 

— < 

— CD 

— CD 

— U 

— < 

— CJ 

— u 

— o 

— u 

— H 

— H 

— E-t 

— u 

— CD 

— CD 

— Eh 

— CD 

— CD 

— EH 

— CD 

— < 

— U 

— H 

— CJ 

— O 

— < 

— U 

— Eh 

— CD 

— CD 

— EH 

— O 

— Eh 

— Eh 

— ^ 

— CJ 

— U 

— H 

— CD 

— CD 

— H 

— O 

— CJ 

— U 

o 
o 



00 

o 
< 

CD 
U 
CJ 
CD 
U 
< 
CD 
CD 
CD 
CD 
CD 
CD 
U 
CD 
H 
U 

u 

CJ 
CD 
CD 
CD 
Eh 
CD 
< 
O 

o 

CD 
Eh 
CJ 
CD 
CJ 
O 
U 
CD 
H 
< 
CJ 

CJ 
CD 
CJ 
U 
CJ 
CJ 
< 
Eh 

CJ 
U 
CJ 
U 
CD 
< 
CJ 
< 
CD 

CJ 

u 

CJ 
Eh 

O 
CD 
CD 
O 

a 

CJ 
CD 

CJ 

o 
< 

u 

Eh 

Eh 
CJ 
CD 
CJ 
CD 
Eh 
< 
CD 

Eh 

CJ 
CD 
< 
CD 

O 

CD 
Eh 

CD 
CD 
O 
CD 
O 
CJ 

CM 
00 



CM 

in 

CJ 

< 

CD 
CJ 

CD 
U 
< 
CD 
CD 
CD 
CD 
U 
CD 



O 
CD 
CD 
CD 
Eh 
CD 
< 
CJ 

u 

CD 
Eh 

O 
CD 
O 
O 
O 
CD 

< 
CJ 
H 
CJ 
CD 
CJ 
CJ 
U 
U 

o 

CJ 

o 

CJ 
CD 
< 

< 
CD 
Eh 
CJ 
CJ 
CJ 
E-» 
Eh 
O 
CD 
CD 
CJ 
CJ 

CD 
CJ 
< 
CJ 
CJ 

< 

CJ 

EH 

Eh 
CJ 
CD 
O 
CD 
Eh 
< 
CD 
E-t 
CJ 
CD 
< 
CD 

E-< 
CD 

EH 

CD 
CD 
CJ 
CD 
CJ 
CJ 

<J\ 
Ch 



I 

CM 



(3 



U.S. Patent 



Oct. 26, 1999 



Sheet 15 of 17 



5,972,616 













r- 


1 




GO 






CO 


o 


GO 


00 




CTi 






i-H 


GO 




— H 


•s 


■ 




EH 


— H 


< 


— rt: 


< 


— 


< 


Eh 






— CJ 


Eh 


— 




< 


— < 


CJ 


— u 




— 


< 


CJ 


— O 


u 


— u 


C!) 


— 


CD 


< 


— < 


u 


— CJ 


Eh 


— 


H 


CD 


— o 


o 


— o 


CJ 


— 


U 


CD 


— CD 


< 


— < 


< 


— 


< 


CD 


— CD 


o 


— o 




— 




CD 


— CD 






< 


— 


< 


< 


— < 


U 


— u 


CJ 


— 


U 


CJ 


— CJ 


CJ 


— CJ 


H 


— 


E-t 


o 


— u 


U 


— u 




— 


< 


o 


— o 


< 


— < 


CJ 




CJ 


CD 


— o 




— u 




— 






— *i 


•s 




CJ 


— 


CJ 




— '5 




— < 


CD 


— 


CD 




— ri. 




— o 


Eh 


— 


E-t 


£h 


— Eh 


< 


— < 


CJ 


— 


CJ 


CD 


— CD 


H 


— ^ 


CJ 


— 


CJ 


CJ 


— U 




— ID 




— 


E-t 


< 


— < 


H 


— Eh 


CD 




CD 




— Eh 




— c? 


O 


— 


CJ 


Eh 


— 


O 


« 


•5 


— — 




CJ 


— O 


u 


— u 


*c 


— 


< 


CD 


— o 




— < 


o 


— 


CD 


O 


— u 




— ID 


* 




CJ 


O 


— c; 




— H 




— 


< 


CD 


— CD 




— {J) 


CJ 


— 


u 


CD 


— CD 


o 


— iD 


CJ 


— 


o 


< 


— < 


H 


— H 


CJ 


— 


o 


CD 


— o 


O 


— o 


t-l 


— 


Eh 


CD 


— CD 


u 


— o 


o 


— 


CJ 


Eh 


— Eh 


< 


— < 


o 


— — 


O 


CD 


— CD 




— CO 


« 




CJ 


Eh 


— Eh 


u 


— CJ 




— 


H 


CJ 


— U 




— a 


o 


— 


CJ 


CD 


— CD 


< 


— < 


< 




< 


< 


— < 


u 


— u 


CJ 


— 


u 


CJ 


— o 


o 


— iD 


u 


— 


u 


CD 


— CD 


tj) 




Eh 


— 


E-t 


< 


— < 


o 


— u 


Eh 


— 


Eh 


CD 


— CD 




— 


CJ 


— 


CJ 


Eh 


Eh 


u 


— o 


CJ 


— 


CJ 


< 


— < 




— o 


< 


— 


< 


CD 


— CD 


< 


— < 


o 


— 


CD 


CD 


— CD 




— o 


Eh 


— 


H 


< 


— < 


u 


— u 


CJ 


— 


O 


Eh 


— H 


< 


— < 


U 


— 


O 


CJ 


— u 


o 


— ID 


< 


— 


< 


CJ 


— U 


u 


— CJ 


< 


— 


< 


CD 


— CD 


o 


• 




— 


CJ 




— H 


H 


* 


< 


— 


< 


U 


— U 


u 


* 




— 


H 


CD 


— CD 


u 




CJ 


— 


CJ 


< 


— < 


H 


— 


CJ 


— 


CJ 


CJ 


— CJ 




U 


Eh 


— 


E-« 


a 


— o 


U 


— U 


a 


— 


O 




— Eh 


O 


— c:) 


a 


— 


CJ 




— H 






CJ 


— 


u 


CJ 


— CJ 


H 


c:) 




— 


Eh 




— Eh 


O 




CJ 


— 


U 


Eh 


Eh 




— u 


CJ 


— 


U 


CJ 


— O 


< 


— ri: 


CJ 


— 


U 


O 


— o 


O 


— C9 


< 


— 


< 


< 


— < 




— H 


H 


— 


H 


O 


— CJ 


H 


— H 


CJ 


— 


CJ 


U 


— u 




• 


CJ 


— 


CJ 


CD 


— CD 


u 


— u 


< 


— 


< 


a 


— CD 


ID 


— CD 


CJ 


— 


CJ 


< 


— < 


< 


— < 


CD 


— 


CD 


CD 


— CD 




— U 


C5 


— 


CD 


Eh 




o 


— C5 


H 


— 


EH 


Eh 


— H 


CJ 


— U 


CD 


— 


CD 


Eh 




• 


E-t 




— 


H 


CJ 


— CJ 


• 


CJ 


C!) 


— 


CD 


CD 


— CD 


• 


< 




— 


Eh 


CD 


— CD 


• 


C!) 






CD 


O 


— CJ 


• 


Eh 


CD 




< 


U 


— CJ 


u 


— CJ 


< 




■ 


U 


— CJ 




c:> 


CJ 




• 


t-t 


— 


r . 














u 


C3 


Eh 




Eh 


u 


— CJ 


u 


— O 


CD 


— 


CD 


CD 


— CD 


< 


H 


CD 


— 


CD 


CD 


— CD 


o 


— CJ 


Eh 




Eh 


CJ 


— U 




< 


CJ 




CJ 


CD 


— CD 


CJ 




CJ 




CJ 


CD 


CJ 


CJ 


— u 


CJ 






U 


• 


CD 


— CD 


CD 




CD 


CD 




< 


— < 


U 




CJ 


< 


— < 


o 


— o 


< 




< 


CD 


— CD 


H 


— H 


u 




CJ 


H 


— E-i 


U 


— o 


CJ 




U 


CJ 


— O 


o 


— CD 


CJ 




O 


< 


— < 




— E-t 


CJ 




O 


CJ 


— CJ 




— CD 


CD 




CD 






< 


— < 


< 




< 






o 


— U 


c:> 




CD 


a 


— CJ 


H 


C3 


c:> 




CD 


CJ 


— CJ 


eg 


CO 






r- 


in 




00 








00 


r- 


00 


r- 


in 








cr» 





CD 
Eh 

CD 
E-* 
< 
U 
CD 
< 
CJ 
CJ 

CJ 

u 

CJ 
CD 
£h 

CD 
CD 
< 
CD 
Eh 

H 
< 

CD 
CD 

< 
CJ 
< 
U 
CD 
frt 
CJ 
< 
CD 

< 



CJ 
£h 
Eh 
CJ 
CD 
U 
CD 

CD 
CD 

CD 

Eh 

CD 

< 
CJ 
CD 
< 
CJ 

u 



u 

CJ 

o 
< 

u 
u 
o 
< 

Eh 
CJ 
< 

u 

CJ 
CD 
CD 
< 

o 
u 

CJ 

< 

u 

u 
u 

CJ 

u 

CD 
< 
CJ 

CJ 

in 
r- 
o 



CD 
CJ 
E-* 
CD 
U 
t-t 
Eh 
CD 
< 
O 
CJ 

u 

CD 
CD 
< 
CD 
< 
CD 
< 
CD 
CD 
CJ 
CD 
Eh 
CJ 
< 



CO 

o 



CD 
CJ 
£h 

CD 
U 

Eh 
CD 
< 
O 
CJ 
CJ 
Eh 

CD 
O 
< 
CD 
< 
CD 
< 
CD 
CD 
U 
CD 
Eh 
O 
< 

Eh 





CD 





CD 


< 


< 


— _ 


< 


CJ 


CD 




CD 


CJ 


CD 





CD 


u 


CD 


— 


CD 


CD 


CD 


— 


CD 


Eh 




— 




CD 








CD 


< 


— 


< 


< 


CJ 


— 


O 


CD 


Eh 


— 


Eh 


Eh 


< 


— 


< 




CD 


— 


CD 




< 


— 


< 




CD 


— 


CD 














— 


Eh 


CD 


CD 


— 


CD 


CD 


CJ 


— 


CJ 


E-t 


<C 


— 


< 


< 


Eh 






U 


CJ 




O 


< 


< 




< 


CJ 


CD 




O 


CD 


CD 




CD 


Eh 


^ 




^ 


U 








< 


O 




U 


CD 


CJ 




CJ 


Eh 


CJ 




u 


Eh 


o 




o 


< 


CD 




o 




Eh 




Eh 




CJ 




U 




CJ 




O 


u 


< 




< 


CJ 


o 




o 


CJ 


CD 




CD 


< 


CD 




CD 


CJ 


CD 




CD 


CJ 


O 




CJ 


CJ 


CD 




CD 


< 


E-t 




e-i 


E-t 


CJ 




U 


O 


O 




CJ 


< 


CD 




CD 


CJ 


et 




Eh 


CJ 


CD 




CD 


CD 


o 




O 




CD 




o 




CD 




CD 


u 


CJ 




O 


CJ 


U 




o 


o 


CJ 




CJ 


< 


CD 




CD 


Eh 


< 




< 


O 


CD 




CD 


< 


CD 




CD 




Eh 






CJ 


U 




u 


u 


CD 




CD 


CJ 


Eh 




Eh 


CJ 


CJ 




CJ 




O 




U 




< 




< 




Eh 




Eh 




CJ 




CJ 




Eh 




Eh 


CJ 


£h 




E-i 


eg 


in 






00 






00 


CO 

















< 
CD 
H 
CD 
< 
O 
CJ 
Eh 

O 
< 
CD 
U 
< 

Eh 

U 
CJ 
Eh 
O 

CJ 
CJ 

< 

Eh 

o 

Eh 
O 
CD 
< 
£h 

Eh 

CJ 

EH 

Eh 

CD 
CD 
CJ 
U 
< 
CJ 

<: 

CD 

u 
u 
< 

CJ 

< 

H 
O 

u 

CD 
< 

Eh 

CD 
< 
U 

Eh 

o 
< 
u 

CJ 
H 

U 
CD 
O 
H 
Eh 
CD 
< 
CJ 
< 
CJ 

Eh 
< 

CJ 
CD 
< 

CJ 
CD 
< 
O 
CJ 
< 
CJ 
Eh 

lO 

rg 



CO 



CD 

CD 
< 
U 
CJ 

u 
< 

CD 
O 
< 
Eh 
O 
CJ 
H 
CJ. 
H 
O 
CJ 
< 

:^ 

CD 
H 
O 
CD 
< 
H 
Eh 

U 
H 
Eh 
CJ 
CD 
CD 
CJ 
CJ 
< 
O 
< 

o 

CJ 
CJ 

< 
o 
< 

E-« 
CJ 
U 
H 
CD 
< 
CJ 
Eh 

< 

< 

o 

&H 

CJ 

< 

CJ 

o 

Eh 
H 
O 
CD 
CJ 
£h 
£h 
CD 
< 
CJ 
< 
CJ 

Eh 

< 

CJ 

CJ 
CD 
< 

u 

CD 
< 
O 
O 
< 
CJ 
Eh 

<N 
00 
O 



o 

O 
CD 
< 
CD 

EH 

< 
CD 
O 
CD 
< 

< 

u 
o 
< 

CD 
CJ 
U 
< 
O 
CD 
£h 

U 
< 
CD 
O 
U 
CD 
O 
O 
E^ 
U 
CD 
O 
Eh 

< 
CD 
Eh 

CD 

O 
CD 
CJ 
CD 
H 
U 
CD 
< 
CD 
CD 



CD 
CD 
U 
U 
H 
< 
E-t 
CD 
Eh 

CD 
CD 
CJ 
CD 
CD 
CD 
CD 
O 
< 
O 
CD 
CJ 
U 
CD 
H 
O 
CJ 
< 

u 

Eh 
H 

CD 
< 
U 
CD 
CD 
CD 
O 
CJ 

o 

CJ 
CD 

< 

u 
o 

CJ 
in 
m 



O 

00 



CJ 
H 
O 
CD 
< 
CD 

e-t 
<< 
o 

CJ 
CD 

< 

CJ 

<< 

o 

CJ 

< 

CD 
O 

a 
< 

CJ 
CD 
Eh 

a 
< 

CD 
CJ 

* 

CD 
CD 
CD 
H 
CJ 
CD 
CD 

Eh 
< 
CD 
Eh 

CD 
i-* 
CJ 
CD 
O 
CD 
Eh 

CJ 
CD 
< 
CD 
CD 



S = S 



CD 
CD 
CJ 

a 

H 

.< 

Eh 

CD 
Eh 

CD 
CD 

a 

CD 
CD 
CD 
CD 
CJ 
< 
CJ 
CD 
CJ 
U 
CD 
E-I 
CD 
U 
< 
CJ 

E- 

CD 
< 
CJ 
CD 
CD 
CD 
CD 
CJ 
CJ 
CJ 
CD 
Eh 

< 
O 
CJ 
CJ 

CM 
CO 



<3* 
in 



o 

< 

CD 
< 
CD 
CD 
CJ 
CD 
Eh 

CJ 
< 
CD 
O 

CD 

CD 
Eh 
CD 
< 
CJ 
< 
CD 
O 
CD 
Eh 

a 

Eh 

CD 
CD 
CD 
Eh 

CJ 
Eh 
Eh 
U 
Eh 

o 
a 
u 

U 
CD 

CJ 
CD 

U 
Eh 

Eh 
CD 

CJ 

CD 

a 

CD 

Eh 

CD 
CJ 
< 
CJ 
Eh 
H 
CD 
< 
CJ 
CJ 

< 

CJ 

u 

CD 
CD 
CJ 
CJ 
CD 
O 
< 
CD 
U 

CD 
Eh 
E- 
CD 
< 
CJ 
CD 
H 
U 



CJ 

< 

CD 
< 
CD 
CD 
O 
CD 
H 
CD 
< 
CD 
U 

CD 
H 
CD 
Eh 

CD 
< 
CJ 
< 
CD 
U 
CD 
fr> 
U 
Eh 
CD 
CD 
O 
Eh 
CJ 
Eh 
Eh 
CJ 
Eh 
O 



CD 

u 

CD 
U 

Eh 
Eh 

CD 

O 
CD 
< 
CD 

O 
CD 
fr^ 
CD 
CJ 
< 
CJ 

CD 
< 

o 
u 

< 
tj 
u 

CD 
CD 
CJ 
U 
CD 
CJ 
< 
CD 
CJ 

CD 

Eh 
Eh 
CD 
< 

u 

CD 
Eh 
CJ 



I 

CsJ 



in 



CO 
CM 




U.S. Patent 



Oct. 26, 1999 




Sheet 16 of 17 



5,972,616 



o 

o 

u 
< 
o 
o 
< 
o 
o 

o 
o 

o 

fr* 
< 

o 

< 
u 
u 

CD 



in 



o 
u 

o 
< 
o 

o 
u 
< 

< 

o 

CJ 

u 
u 
u 

o 

o 
< 
u 
o 

CD 
CD 
CD 
< 
CJ 
CD 
< 
CD 
U 
< 
CD 
O 
CD 

lO 



CD 
CD 
CD 
H 

^ 

O 
O 

CD 

CD 
CD 
< 
U 

CJ 
O 
< 
CD 
< 
O 

u 
o 
< 

CD 
CD 
U 
U 

CD 

< 
O 
CD 
H 

CD 
CD 
O 
< 
U 
CD 
< 
CD 
CJ 
< 
CD 
U 
CD 

CO 



U 
CD 
CD 
LD 



U 
CD 
O 
CD 



3 = ^ 



U 
CD 
< 
CD 

E-i 
U 
CD 

a 

U 
CD 
CD 
CD 



O 
CD 

CD 

U 
CD 
H 

H 
U 
CD 
CD 
CD 
H 



a 

u 
a 

CD 

o 

,CD 

o 
< 

CJ 
CJ 

< 

CJ 



u 

CJ 

< 

C5 
CJ 

< 

CJ 

CD 
CJ 

CD 
CJ 



3 = 



CD 
E- 
CD 
CD 

u 

CJ 

u 

CD 
H 
CJ 
CJ 

CJ 

u 

CD 

< 

C5 
CJ 
< 
CD 
CJ 
CJ 

CD 
CD 

CJ 
< 
CD 
iD 
CD 



CJ 
H 
CJ 
CJ 
CD 

CJ 
CD 
CJ 
O 
< 
H 
CJ 
CJ 
< 

< 
u 



u 

CJ 

< 

H 
CD 
E-« 
E-« 
CJ 
< 
CJ 

C5 
U 
Eh 

CD 
O 

CD 
E-« 
CD 
CD 

O 

u 
o 
o 
o 

H 
O 
O 
frt 
CJ 
O 
CD 
CD 
< 
CD 
CJ 
< 
CD 
U 
CJ 

CD 
CD 
CD 
O 
< 
CD 
CD 
CD 

oo 



GO 



CD 
CD 
CD 
Eh 
E-t 
CD 

Eh 
Eh 

O 

Eh 

CJ 
CJ 

o 

CJ 

< 

CD 
< 
CD 
U 
< 

o 

tH 
< 

o 

Eh 

o 

CD 
O 
CD 

Eh 

CJ 
CD 
CD 
CD 

Eh 

CD 
fr> 
U 
< 
CD 
CJ 
CD 

CJ 
< 
CD 
CD 



CD 

CD 
< 
O 
H 
U 
O 
CD 
U 
< 
CD 
O 
CD 
< 
Eh 
CD 
Eh 
O 
< 
CD 
CD 
rtj 
CD 
CD 



CD 
CD 
CD 

^ 
CD 

Eh 

CD 

CD 
< 
C5 

Eh 

U 

o 

CJ 



C5 
C5 
CD 

Eh 

C!) 

Eh 

CD 

Eh 

CJ 

o 

CD 
CD 
< 

< 

< 
O 
U 
< 

u 

E-t 
< 

o 

Eh 

CD 
CD 
CJ 
CD 
Eh 
CJ 
CD 
O 
CD 

Eh 

CD 

O 
< 
CD 
CJ 
CD 
Eh 
CJ 
< 
CD 
CD 

CD 

CD 
< 
O 
E-t 
CJ 
CD 
CD 
CJ 
< 
CD 
U 
CD 
< 
Eh 
CD 
H 

^ 

CD 
CD 
< 
CD 
CD 

CD 
CD 
CD 
U 
< 
O 

CD 
E^ 
CD 
< 
CD 
Eh 
CJ 
CJ 
CJ 

oo 
r- 
in 



CJ 
CD 
CD 

Eh 

CJ 



CJ 
CJ 
CJ 
tr* 
CJ 

CJ 
H 
< 
U 
E-< 
CJ 

u 
u 

EH 

H 
CJ 
CD 
H 
CD 
CD 
CJ 
CD 
H 
CJ 
H 
< 
CJ 
< 

o 
u 

CD 
CD 
CD 
< 
CJ 
CJ 
CD 
CD 
CD 
Eh 
O 
Eh 
CJ 
CD 
Eh 

< 
CJ 
CD 
Eh 

u 

CJ 
CD 

^ 

Eh 
CD 
CD 
< 
CJ 
CD 
CD 

Eh 

U 

u 

CJ 
CD 
CD 
H 
CD 
< 
CD 
O 
CD 
CD 
CD 
< 
CD 
H 

<: 

CD 
CD 
U 
CD 
Eh 
< 
CD 
CD 
CJ 
< 

a 

CD 
CD 



CO 







cn 




If) 






n 




" 1 






r- 




r- 
















o 




CO 










o 




rH 


eg 




rH 


CM 




«-H 


CM 


CM 





U 


CD 


— 


CD 


CD 


— 


CD 


CD 


— CD 


— 


CD 


CJ 


— 


CJ 


Eh 


— 


Eh 


CD 


— CD 





CD 


CD 


— 


CD 


O 


— 


CJ 


CD 


— o 





Eh 


< 


— 


< 




— 


CD 


Eh 


— Eh 





CJ 

< 


CJ 


— 


U 




— 


< 


CJ 


— CJ 





CJ 


— 


CJ 


CD 


— 


CD 


CD 


— CD 





2 


CD 


— 


CD 


CD 





CD 


CD 


— CD 





CJ 


< 


— 


< 


EH 





Eh 


CD 


— CD 





CJ 


CD 


— 


CD 


U 


— 


U 


U 


— U 





U 




— 


< 


CD 





CD 


< 


— < 





Eh 


o 


— 


U 


£h 


— 


Eh 


U 


— U 


— 


CJ 


o 


•■ — 


CJ 


CJ 


— 


O 


Eh 


tH 


— 


Eh 


< 


— 


< 


CD 


— 


CD 


CD 


— CD 




U 


o 


— 


CD 


CJ 


— 


CJ 


CD 


— CD 







CJ 


— 


U 


CD 





CD 


CD 


— CD 


— 


< 




— 


< 


U 


— 


CJ 


Eh 


Eh 





U 


CJ 


— 


U 


EH 


— 


£h 


CJ 


— CJ 





H 


CD 


— 


CD 


< 


— 


< 


Eh 


— £h 


— 


O 


Eh 


— 


EH 








< 


— < 




CJ 


Eh 


— 


Eh 


< 


— 






— U 


— 


O 


CJ 


— 


U 


CD 


— 


CD 


u 


— O 





Eh 


CD 


— 


CD 


tH 


— 


H 


CD 


— o 


— 


Eh 


CD 


— 


CD 







< 


CD 


— CD 




O 


CD 


— 


CD 


Eh 




Eh 


•3 






CD 




— 


EH 


CJ 


— 


O 




— < 





Eh 


O 


— 


O 


< 


— 


< 


CJ 


— o 





CD 


CJ 


— 


o 


CD 


— 


CD 


CD 


— CD 


— 


CD 


Eh 


— 


Eh 


O 


— 


CJ 


CD 


— O 





O 


Eh 




Eh 


Eh 


— 


Eh 


CJ 


— U 


— 


CD 


CJ 


— 


CJ 


tH 


— 


Eh 


O 


— o 





Eh 


CJ 


— 


CJ 


a 





u 


CD 


— o 





O 


CD 


— 


CD 


o 


— 


u 


Eh 


— H 





tr* 


CD 


— 


CD 


< 


— 


< 


CJ 


— u 


— 


< 


CJ 


— 


O 


o 





o 


O 


— o 







< 


— 


< 


Eh 


— 


Eh 


CJ 


— u 


— 


< 


CD 


— 


CD 


Eh 





Eh 


tH 


— tH 


— 


CJ 


CD 


— 


CD 


O 


— 


CJ 


tH 


tH 


— 


U 


Eh 




« 


< 





< 


CJ 


— CJ 


— 


CD 


CD 




* 


CD 





CD 


Eh 


— tH 





O 


< 


— 


< 


Eh 





Eh 


a 


— o 





CD 


CJ 


— 


CJ 




— 




£h 


— Eh 





< 


CD 


— 


CD 


< 


— 




< 


— < 


— 


CJ 


CJ 


— 


CJ 


a 


— 


CJ 


CJ 


— u 





CJ 


< 




< 




— 


tH 


U 


— o 


— 


o 


CJ 


— 


CJ 


Eh 


— 


H 


CJ 


o 





CD 


o 


— 


CJ 


o 





CJ 


H 







CD 


CJ 


— 


CJ 


tH 





Eh 


CJ 


— u 





E-< 


u 


— 


CJ 


Eh 


— 


Eh 


U 


— CJ 





CJ 


< 


— 


< 


CJ 





CJ 


CD 


— o 


— 


EH 


CD 


— 


CD 


CJ 





CJ 


U 


— u 





O 


< 


— 


< 


CJ 





CJ 


< 


— < 




CD 


U 


— 


O 


u 





u 


CD 


— CD 


— 


Eh 


Eh 


— 


Eh 


< 





< 


CD 


— CD 





< 


o 


— 


CJ 


CJ 





CJ 


O 


— O 


-_ 


CJ 


< 


— 


< 


O 





CJ 


CJ 


— U 




CD 


Eh 


— 




CJ 





CJ 


CD 


— CD 





H 


CD 


— 


CD 


tH 





tH 


Eh 


— Eh 





CJ 


O 


— 


CD 


CJ 


— — 


o 


o 


— O 


— 


CJ 


< 


— 


< 


Eh 





tH 


CJ 


— u 


_ 


CD 


U 


— 


CJ 







< 


CD 


— CD 









— 


Eh 


U 


— 


CJ 


Eh 


— Eh 









— 


Eh 


Eh 





H 


CJ 


— CJ 









— 


< 


< 


— 


< 


tH 


— tH 





CD 


CD 


— 


CD 


CJ 





CJ 




— < 





CD 


CD 


— 


CD 


CD 


— > 


CD 


u 


— u 




< 




— 


< 


CJ 





CJ 


CJ 


— CJ 





CJ 


CD 


— 


CD 


CD 


— 


CD 


o 


— u 




CD 


< 


— 










CD 


— CD 


__ 


CD 







CJ 


< 






CD 


— e> 




Eh 


< 





< 


CJ 




CJ 


CJ 


— o 




o 


CD 




CD 


Eh 





Eh 


CD 


— CD 





O 


H 





tH 


O 





CJ 


Eh 


— Eh 





CJ 


«: 


— 


< 


CD 





CD 


CD 


— CD 





CD 


CD 


— 


CD 


CD 





a 


CD 


— CD 





a 


u 


— 


CJ 


< 


— 


< 


tH 


— H 





H 




— 


Eh 


CJ 





CJ 


< 


— < 





CD 


< 


— 


< 


CD 





CD 




— CJ 


— 


< 


U 


— 


CJ 


O 





O 


o 


— o 





CD 


< 


— 


•< 


CD 





CD 


tH 


— H 





U 


Eh 


— 


E-t 


< 





< 


u 


— o 





CD 


U 


— 


U 


CD 





O 


CD 


— CD 


— 


CD 


CD 


— 


CD 







CD 


< 


— < 




t *^ 
w 


*• - 

CH 




tH 














< 


CJ 





O 


U 


— 


U 


< 


— < 


— 


CD 


< 


— 


< 


CD 


— 


CD 


Eh 


— Eh 


— 


Eh 


u 


— 


CJ 


tH 





Eh 


CD 


— CD 




ri; 




— 


< 


CD 





CD 


< 


— < 




CD 


CJ 




CJ 


CD 




CD 




= ^ 




o 


CD 




CD 


CD 




CD 








u 


U 




O 


CD 




tD 


U 


— U 




CD 


CJ 




O 


tH 




tH 


CD 


— CD 




H 


CD 




CD 


CJ 




U 


CD 


— CD 




< 






Eh 


CJ 




O 


U 


— CJ 




O 


CJ 




U 


CJ 




U 


U 


— CJ 




CD 


Eh 




Eh 


CJ 




CD 








CJ 


CJ 




U 


CD 




CD 








< 


Eh 




tH 


U 




• 








CJ 


CD 




CD 


CD 






o 


— o 




CD 


CD 




CD 


< 




< 


< 


— < 




CD 


Eh 




EH 


CJ 




U 


CD 


— o 




oo 






oo 






vo 




















r- 


r- 












r* 


o 




oo 


«-H 






i-H 


f-H 






fNI 




r-H 


CM 


f-H 



ro 

CM 

H 
< 

CJ 
CD 
< 

u 

CD 
O 
CJ 
CD 
Eh 
CJ 
U 

tH 

U 
U 

CD 
< 
CD 
U 
CD 
Eh 
U 
U 
< 

a 

CJ 

CD 
< 
CJ 
U 

u 

tH 
< 

u 

tH 

CD 
O 
O 
U 

u 

EH 

CD 
< 
CD 

tH 

CD 
CD 
CD 



U 
CD 
Eh 
CJ 
U 
Eh 

< 
CD 

tH 

O 
CD 
O 
CD 

a 

CD 
CD 

tH 

U 
< 
U 
CD 
CD 
< 
CD 
CD 

Eh 

o 

CJ 

o 

CJ 

< 
< 

CD 

CN 

CM 



m 

f-H 

CM 

tH 
< 
CD 
< 
O 
CD 
rf! 
CJ 
CD 
CJ 
CJ 
CD 

CJ 
CJ 
Eh 

U 
U 

CD 
< 
CD 
CJ 
CD 
H 
U 
O 
< 
CJ 
CJ 

< 

CD 
< 
CJ 
CJ 

o 

tH 

< 
u 

H 
CD 
CJ 
tD 
U 
U 

tH 

< 
CD 
< 
CD 
Eh 
CD 
CD 
CD 



U 
CD 

tH 

CJ 
U 

tH 

< 
CD 

tH 

CJ 
CD 
CJ 
CD 
CJ 
CD 
CD 
Eh 

U 
< 

u 
a 

CD 

< 

a 

CD 

Eh 

< 

tH 

CJ 

u 
o 
< 
a 

CD 

o 

CM 



CO 
I 

Csl 

o 

u. 



U.S. Patent 



Oct. 26, 1999 



Sheet 17 of 17 



5,972,616 



n 




n 


ro 




CM 








r- 




r- 






CM 








CNJ 




CM 


CM 




CM 


O 




CD 


CD 




CD 


O 




CD 








CJ 




CJ 








O 


— 


CD 




— 








CD 


CJ 


— 


CJ 






CD 


H 


— 


Eh 


Eh 




Eh 


< 




< 


< 


— 


< 


CD 


■ - 


CD 


o 




CD 


CD 


— 


CD 


o 


— 


CD 


Eh 


— 


Eh 


o 




O 


CJ 








— 


CD 


< 


— — 


< 






CD 


CD 


— 


CD 


< 


— 


< 


CD 


— 


CD 


o 


■ 


CD 


CD 


■" — 


CD 


o 


— 


CD 


U 


— 


CJ 






Eh 


t-t 


— 




o 




CD 


Eh 


— 




u 


— 


U 


Eh 




Eh 




— 


CD 


CD 


■ 


CD 




— 


< 


tH 


— 


Eh 


o 




O 


O 


— 


O 


o 


— 


O 


£-• 


— 


Eh 


Eh 




Eh 


CJ 


— 


CJ 


e) 


— 


CD 


u 


— 


CJ 




— 


£h 


o 


— 


u 


o 


— 


O 


tH 




H 


CJ 


— 


U 


CJ 


— ■ 


u 


o 


— 


U 


CD 


- — 


CD 


o 


— 


U 


CD 




CD 


< 




< 








o 


— 


CD 




" ■ ■ 






— 


CD 


CJ 


■ 




o 


— 


O 




— 


< 


o 


— 


CD 


CJ 








— 


CD 


< 


— 


< 


u 


— 


CJ 


Eh 






u 


— 


U 


CD 


— 


CD 




— 


Eh 


Eh 


— 




Eh 


•— 


Eh 


o 




CD 


< 


— 


< 


u 


— 


U 


o 


— 


CD 


CD 


— 


O 




— 


Eh 


a 


— 


CD 


o 


— 


CD 


< 


— 


< 


o 


— 


CD 


u 


— 


U 


o 


— 


CD 


CJ 


— 


u 


< 


— 


< 


CD 


— 


CD 


o 


— 






— 




o 


— 


O 


< 


— 


< 


o 


— 


CD 


CJ 


— 




Eh 


— 


Eh 


< 


— 




u 


— 






— 




CJ 


— 


O 


CD 


— 


CD 


Eh 


— 




CD 


— 


CD 


U 


— 


O 


< 


— 


< 


< 


— 


< 


CD 


— 


CD 


iD 


— 


CD 


< 


— 


< 




— 


CD 


U 


— 


CJ 




— 


EH 


H 


— 


Eh 


C3 


— 


CD 


CJ 


— 


CJ 


CJ 


— 


U 


CD 


— 


O 


iD 


— 


CD 


CJ 


— 


CJ 


C5 


— 


CD 


CD 


— 


CD 


U 


— 




H 


— 


H 


CD 


— 


CD 


U 


— 


U 


C5 


— 


CD 


CD 


— 


CD 


U 




U 


CD 




• 


CP 


— 


CD 


U 


— 




< 


— 


rf! 


< 


— 


< 


CJ 




CJ 


CD 


— 


CD 












< 


u 





U 


CD 


— 


CD 


CJ 


— 


o 


CD 


— 


CD 


tH 


— 


H 


CD 


— 


CD 


Eh 




Eh 


CD 


— 


CD 


o 


— 


U 




— 


EH 


c:> 


— 


CD 


CJ 


— 


CJ 


C5 


— 


CD 


CD 


— 


CD 


C3 


— 


CD 




— 


< 


Eh 


— 


H 


CD 


— 


CD 


CD 


— 


CD 


Eh 


— 


£h 


CJ 


— 


C-> 


CD 


— 


CD 


rn 




r 1 








E-« 





H 


Eh 


— 


Eh 


CD 


— 


O 


CD 


— 


CD 


Eh 


— 




Eh 


— 


Eh 


< 


— 


< 


CD 


— 


CD 






CD 


CD 




CD 






EH 


CJ 




CJ 


< 




< 


U 




U 


O 




u 


CD 




CD 


CD 




CD 


CD 




CD 


CJ 




U 


< 




< 


CD 




CD 


CJ 




CJ 


O 




U 


CJ 




CJ 


CJ 




CJ 






Eh 


CD 




CD 






EH 


CJ 




U 


CJ 




U 


< 




< 


Eh 










u 


< 


















r- 




r- 








m 










eg 


CM 




CM 


Csl 




CM 



o 



Eh 
CJ 
< 
CD 
< 
CD 
CD 
H 
U 
CD 
CD 
< 
U 
CD 
Eh 
CJ 
CJ 
CD 
CJ 
< 
CJ 
CD 
f-i 
CD 

tH 

CD 
< 
CJ 
O 
U 
CJ 
< 
U 
CJ 

tH 

CD 
U 
H 
< 
U 
U 

o 

< 
u 
u 

CD 
CD 
CD 
CD 
CJ 
CD 
H 
CJ 
CJ 
< 

u 

< 

Eh 
CD 

EH 

CD 

EH 



CJ 
U 
CJ 
< 
CJ 

u 

CD 
CD 
CD 
CD 
CJ 
U 
CD 
CD 
CD 
CD 
< 

< 

tH 

CD 

a 

CD 
CD 

EH 

CJ 
< 
CJ 

CD 

r- 
in 

CM 



r- 

'IT 

<N 

CJ 
CD 
CJ 

tH 
U 
< 

CD 
< 
CD 
CD 
Eh 
CJ 
CD 
CD 
< 
CJ 
CD 
H 
CJ 
U 
CD 
CJ 
< 
CJ 
CD 

CD 

tH 

CD 
< 
CJ 
CJ 
CJ 

o 

< 
u 
o 

tH 

CD 
CJ 
Eh 

< 

u 

CJ 
CJ 

< 

CJ 

u 

CD 
CD 
CD 
C5 
CJ 
CD 

tH 

CJ 
CJ 

< 
< 

tH 

CD 

tH 

CD 

EH 



CJ 
U 
O 
< 
CJ 

o 

CD 
CD 
CD 
CD 
CJ 
U 
CD 
CD 
CD 
CD 
< 

< 

tH 

CD 

CD 
CD 
CD 

tH 

CJ 

CD 
< 

CO 

r- 

CM 



o 



VP 
CM 



:5 = S 



CJ 

o 

tH 

a 
o 

CD 

< 
u 

H 
U 

u 

tH 

Eh 

U 

o 
o 

tH 

u 

Eh 
U 
CJ 



CD 

tH 

CD 

tH 

U 
< 
CJ 

< 

CJ 



CJ 

u 

H 
U 
U 
CD 
< 

u 

H 
CJ 
U 

tH 

Eh 

a 

CD 
CJ 
Eh 

u 

H 

U 

a 



u 
u 

u 

CD 
CD 

u 
o 

£h 
CJ 
Eh 

CJ 
Eh 

O 



CD 
< 
O 
CJ 
CJ 

u 

CJ 
CD 
O 
CD 
< 

o 

CJ 

< 

CJ 
CD 

tH 

u 
<: 

CD 

tH 

CJ 
CD 
U 
CJ 
< 
CD 
CD 



VP 
CSJ 



CD 
CD 
< 

a 

CJ 

tH 

CJ 

tH 

< 

i 

u 

tH 

u 

% 

CD 

tH 

CD 
H 
U 
< 
CJ 

EH 

1 

CD 
CJ 

a 

CJ 
CJ 

CD 

u 

CD 

< 
u 
u 

u 

CD 

tH 

u 
o 
< 

CD 
H 
CD 
U 

^ 

CJ 
CD 

n 
r- 

CM 



CO 
VP 
CO 
CM 

u 

CD 

U 
CJ 
CJ 

u 

CD 
< 
CJ 

u 

CD 
O 
O 
CJ 
O 
CJ 

tH 

CJ 

u 

CD 
< 
CJ 

< 
< 

CD 

CD 
H 

tH 
tH 

a 

CD 



oo 
tn 

VP 

CM 
CD 



U 

o 
o 

CD 
< 
CJ 
CD 
CD 
CJ 
CJ 

tH 

O 



< 
CJ 

< 

CJ 

» 

CD 

o 

Eh 
tH 

EH 

O 
CD 

% 



CD 
CD 
CD 
CD 

tH 

CJ 



CJ 
CJ 
CJ 

< 

CD 
H 
CJ 
< 

CJ 

tH 
tH 

CD 
CD 

tH 

CD 
CD 

CD 
CD 
< 
CD 
CD 
CD 

i 

CD 

tH 

CD 
CD 
• 

CD 

o 

CD 

Eh 
U 
O 
< 
CD 
CD 

tH 

CD 
< 



CM 



u 
u 

CJ 

< 

CD 

CJ 
< 

tH 

CJ 
Eh 
Eh 
CD 
CD 

tH 

O 
CD 
Eh 

a 

CJ 

% 

CD 

CD 
CD 
CD 
CD 



CD 
< 

tH 

CD 
O 
CD 
< 
O 

O 
CD 
H 
U 
CD 
< 
CD 
CD 

tH 

CD 
< 

CO 
VP 

ir> 

CM 



CO 
VP 
CTk 
CM 

CD 
CD 
CJ 
CJ 
CD 

tH 

CJ 
CD 
CD 
CD 
CD 

tH 

CD 
CD 

tH 

CD 
CD 



1 






1 1 




\ r 


r- 


o 


00 


CM 


m 


CN 


CD 


Eh 


— Eh 


CD 


o 


— CJ 


U 


CD 


— CD 


O 


< 


— < 


CD 


U 


— U 


H 


U 




CJ 


< 


— < 


< 


tH 


— tH 


CD 


tH 


— Eh 


tH 


H 


— tH 


CD 


H 


CD 




CD 


— CD 


CD 


H 


— H 




H 





CD 
H 
CD 
< 
U 
H 
CJ 
U 

tH 

CJ 
CJ 
CD 
< 
CD 
CD 
U 
H 
H 
O 
CD 
< 
CD 
CD 
U 





CD 




o 


<. 


H 




tH 


CD 


CJ 




o 


tH 


< 




< 


CD 


CD 






< 


< 




< 


tH 


CD 




CD 


CJ 


tH 




tH 


Eh 


u 




U 


CJ 


tH 




Eh 


H 


CD 




CD 




CD 




CD 




CD 




CD 




O 




U 




< 




<: 








CD 








< 




CJ 







o 

CD 
CD 
CJ 
CD 
< 
CJ 
CD 
< 
CD 
CD 

% 

Eh 
CD 
H 
CJ 

tH 

CD 
H 
CJ 
O 
U 
CJ 
t^ 
CJ 
O 
CD 
H 
CJ 

< 

tH 

CD 

Eh 

CD 

tH 
EH 
tH 

CD 
U 
CD 

U 
CD 
CD 
< 
CD 
U 
U 
CD 
CD 
CD 
Eh 

o> 

VP 

oo 
CM 



CD 
O 

tH 

CJ 
O 
< 
CD 



CD 
C3 
CJ 
< 

EH 

< 
Eh 

U 
t^ 
U 
CD 
H 
U 

« 

CJ 
CJ 
CD 

CJ 

< 
U 
CD 
Eh 

CD 

tH 

Eh 
H 
CD 
U 
CD 

U 
< 
CD 
CD 
< 
C5 

o 

CD 
O 
CD 

IT) 
VP 
CM 



CD 
CD. 

CJ 
O 
U 
< 
CD 

% 

CD 
-CD- 
U 
H 
U 
-CD 
O 
< 

o 
o 
u 

CD 



CD 
CD 
H 
CJ 

u 

CJ 

< 

CD 
CD 
< 
CD 
CD 
U 
Eh 
O 
CD 
O 
< 
O 
CJ 

u 

CD 



CD 



CD 
CD 

ks: 

CD 

tH 
tH 

o 

EH 

CJ 
CD 

O 

u 

CD 
CD 
CD 

tH 
tH 

CJ 
CJ 
CJ 
CD 

o 

CD 
CD 

H 

CD 
H 
CJ 
CD 
CD 
CD 

tH 

O 

o\ 

VP 
CTi 
CM 



CD 
< 

CD 
H 
H 
U 
H 

U 
CD 
U 
< 
CJ 

u 
o 
o 

Eh 
Eh 

o 
u 

tH 

CD 
O 

tH 

CD 

u 



CD 
CD 
H 
CJ 
Eh 

VP 

m 
r- 

CM 



Q 

H 



cn 

T-K 

n 



H 
CD 
H 
CD 

tH 
tH 
tH 
< 
tH 

CD 

tH 

o 

tH 

CD 

Eh 

CD 
CJ 

tH 
tH 

CJ 

CD 
CD 

tH 

CD 
CD 

% 

U 
CJ 

u 

CJ> 
VP 
o 
m 



§ 

Q 

H 



C/3 



o 
o 
cn 

CM 



H 
H 
H 
H 
Eh 

CJ 
H 
Eh 
H 

EH 
tH 

E-< 
H 
< 
CJ 



< 

tH 

CD 

tH 

CD 

tH 

CD 

tH 

CD 
< 

U 

fH 

CJ 

CD 

tH 



CD 
CD 
< 
O 
CJ 
U 

ro 
n 
oo 

CM 



CM 



5,972, 

1 

TADG-15: AN EXTRACELLULAR SERINE 
PROTEASE OVEREXPRESSED IN BREAST 
AND OVARIAN CARCINOMAS 

BACKGROUND OF THE INVENTION 5 

1. Field of the Invention 

The present invention relates generally to the fields of 
cellular biology and the diagnosis of neoplastic disease. 
More specifically, the present invention relates to an extra- jq 
cellular serine protease termed T\imor Antigen Derived 
Gene -15 (TADG-15), which is overexpressed in breast and 
ovarian carcinomas. 

2. Description of the Related Art 

Extracellular proteases have been directly associated with ^5 
tumor growth, shedding of tumor cells and invasion of target 
organs. Individual classes of proteases are involved in, but 
not limited to (1) the digestion of stroma surrounding the 
initial tumor area, (2) the digestion of the cellular adhesion 
molecules to allow dissociation of tumor cells; and (3) the 20 
invasion of the basement membrane for metastatic growth 
and the activation of both tumor growth factors and angio- 
genic factors. 

The prior art is deficient in the lack of effective means of 
screening to identify proteases overexpressed in carcinoma. 
The present invention fulfills this longstanding need and 
desire in the art. 

SUMMARY OF THE INVEW ION 

30 

The present invention discloses a screening program to 
identify proteases overexpressed in carcinoma by examining 
PGR products amplified using differential display in early 
stage tumors, metastatic tumors compared to that of normal 
tissues. 35 

In one embodiment of the present invention, there is 
provided a DNA encoding a TADG-15 protein selected from 
the group consisting of: (a) isolated DNA which encodes a 
TADG-15 protein; (b) isolated DNA which hybridizes to 
isolated DNA of (a) above and which encodes a TADG-15 40 
protein; and (c) isolated DNA differing from the isolated 
DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG- 
15 protein. 

In another embodiment of the present invention, there is 45 
provided a vector capable of expressing the DNA of the 
present invention adapted for expression in a recombinant 
cell and regulatory elements necessary for expression of the 
DNA in the cell. 

In yet another embodiment of the present invention, there 
is provided a host cell transfected with the vector of the 
present invention, the vector expressing a TADG-15 protein. 

In still yet another embodiment of the present invention, 
there is provided a method of detecting expression of a 
TADG-15 mRNA, comprising the steps of: (a) contacting 
mRNA obtained from the cell with the labeled hybridization 
probe; and (b) detecting hybridization of the probe with the 
mRNA. 

Other and further aspects, features, and advantages of the gQ 
present invention will be apparent from the following 
description of the presently preferred embodiments of the 
invention given for the purpose of disclosure. 

BRIEF DESCRIPTION OF THE DRAWINGS 

65 

So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others 



2 

which will become clear, are attained and can be understood 
in detail, more particular descriptions of the invention 
briefly summarized above may be had by reference to 
certain embodiments thereof which are illustrated in the 
appended drawings. These drawings form a part of the 
specification. It is to be noted, however, that the appended 
drawings illustrate preferred embodiments of the invention 
and therefore are not to be considered limiting in their scope. 

FIG. 1 shows a comparison of PGR products derived from 
norma! and breast carcinoma cDNA as shown by staining in 
an agarose gel. 

FIG. 2 shows a comparison of the serine protease catalytic 
domain of TADG-15 with hepsin (Heps, SEQ ID No: 3), 
(Scce. SEQ ID No: 4), trypsin (Try, SEQ ID No: 5), 
chymotrypsin (Chymb, SEQ ID No: 6), factor 7 (Fac7, SEQ 
ID No: 7) and tissue plasminogen activator (Tpa, SEQ ID 
No: 8). The asterisks indicate conserved amino acids of 
catalytic triad. 

FIG. 3 shows quantitative PGR analysis of TADG-15 
expression. 

FIG. 4 shows the ratio of TADG-15 expression to expres- 
sion of p-tubulin in normal tissues, low mahgnant potential 
tumors (LMP) and carcinomas. 

FIG. 5 shows the TADG-15 expression in tumor cell lines 
derived from both ovarian and breast carcinoma tissues. 

FIG. 6 shows the overexpression of TADG-15 in other 
tumor tissues. 

FIG. 7 shows the Northern blots of TADG-15 expression 
in ovarian carcinomas, fetal and normal adult tissues. 

FIG. 8 shows a diagram of the TADG-15 transcript and 
the clones with the origin of their derivation. 

FIG. 9 shows nucleotide sequence of the TADG-15 cDNA 
(SEQ ID No: 1) and amino acid sequence of the TADG-15 
protein (SEQ ID No: 2) 

FIG. 10 shows the amino acid sequence of the TADG-15 
protease including functional sites and domains. 

FIG. 11 shows a structure diagram of the TADG-15 
protein including functional domains. 

FIG. 12 shows a nucleotide sequence comparison 
between TADG-15 and human SNC-19 (GeneBank acces- 
sion #U20428). 

DETAILED DESCRIPTION OF THE 
INVENTION 

As used herein, the term "cDNA" shall refer to the DNA 
copy of the mRNA transcript of a gene. 

As used herein, the term "derived amino acid sequence" 
shall mean the amino acid sequence determined by reading 
the triplet sequence of nucleotide bases in the cDNA. 

As used herein the term "screening a library" shall refer 
to the process of using a labeled probe to check whether, 
under the appropriate conditions, there is a sequence 
complementary to the probe present in a particular DNA 
library. In addition, "screening a library*' could be performed 
by PGR. 

As used herein, the term "PGR" refers to the polymerase 
chain reaction that is the subject of U.S. Pat. Nos. 4,683,195 
and 4,683,202 to Mullis, as well as other improvements now 
known in the art. 

The TADG-15 cDNA is 3147 base pairs long (SEQ ID 
No:l) and encoding for a 855 amino acid protein (SEQ ID 
No:2). The availabihty of the TADG-15 gene opens the way 
for a number studies that can lead to various applications. 
For example, the TADG-15 gene can be used as a diagnostic 



5,972,616 



or therapeutic target in ovarian carcinonoa and other carci- 
nomas including breast, prostate, lung and colon. 

In accordance with the present invention there may be 
employed conventional molecular biology, microbiology, 
and recombinant DNA techniques within the skill of the art. 
Such techniques are explained fully in the literature. See, 
e.g., Maniatis, Fritsch & Sambrook, "Molecular Cloning: A 
Laboratory Manual" (1982); "DNA Cloning: A Practical 
Approach," Volumes I and 11 (D. N. Glover ed. 1985); 
"Oligonucleotide Synthesis" (M. J. Gait ed. 1984); "Nucleic 
Acid Hybridization" [B. D, Hames & S. J. Higgins eds. 
(1985)]; "Transcription and Translation" [B. D. Hames & S. 
J. Higgins eds. (1984)]; "Animal CeU Culture" [R. I. 
Freshney, ed. (1986)]; "Immobilized Cells And Enzymes" 
[IRL Press, (1986)]; B. Perbal, "A Practical Guide To 
Molecular Cloning" (1984). 

Therefore, if appearing herein, the following terms shall 
have the definitions set out below. 

The amino acid described herein are preferred to be in the 
"L" isomeric form. However, residues in the "D" isomeric 
form can be substituted for any L-amino acid residue, as 
long as the desired functional property of immunoglobulin- 
binding is retained by the polypeptide. NH^ refers to the free 
amino group present at the amino terminus of a polypeptide. 
COOH refers to the free carboxy group present at the 
carboxy terminus of a polypeptide. In keeping with standard 
polypeptide nomenclature, / Biol. Chem,, 243:3552-59 
(1969), abbreviations for amino acid residues are shown in 
the following Table of Correspondence: 



TABLE OF CORRESPONDENCE 



SYMBOL 
1 -Letter 


3-LcUer 


AMINO ACID 


Y 


lyr 


tyrosine 


G 


Gly 


glycine 


F 


Phe 


Phenylalanine 


M 


Met 


methionine 


A 


Ala 


alanine 


S 


Ser 


serine 


I 


He 


isoleucine 


L 


Ixu 


leucine 


T 


Thr 


threonine 


V 


Val 


valine 


P 


Pro 


proline 


K 


Lys 


lysine 


H 


His 


histidine 


O 


Gin 


glutamine 


E 


Glu 


glutamic acid 


W 


Tip 


tryptophan 


R 


Arg 


arginine 


D 


Asp 


aspartic acid 


N 


Asn 


asparagine 


C 


Cys 


cysteine 



It should be noted that all amino-acid residue sequences 
are represented herein by formulae whose left and right 
orientation is in the conventional direction of amino- 
terminus to carboxy-terminus. Furthermore, it should be 
noted that a dash at the beginning or end of an amino acid 
residue sequence indicates a peptide bond to a further 
sequence of one or more amino-acid residues. The above 
Table is presented to correlate the three-letter and one-letter 
notations which may appear alternately herein. 

A "replicon" is any genetic element (e.g., plasmid, 
chromosome, virus) that functions as an autonomous unit of 
DNA replication in vivo; i.e., capable of replication under its 
own control. 

A "vector** is a replicon, such as plasmid, phage or 
cosmid, to which another DNA segment may be attached so 
as to bring about the replication of the attached segment. 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



A "DNA molecule" refers to the polymeric form of 
deoxyribonucleotides (adenine, guanine, thymine, or 
cytosine) in its either single stranded form, or a double- 
stranded helix. This term refers only to the primary and 
secondary structure of the molecule, and does not limit it to 
any particular tertiary forms. Thus, this term includes 
double-stranded DNA found, inter alia, in linear DNA 
molecules (e.g., restriction fragments), viruses, plasmids, 
and chromosomes. In discussing the structure herein accord- 
ing to the normal convention of giving only the sequence in 
the 5' to 3' direction along the nontranscribed strand of DNA 
(i.e., the strand having a sequence homologous to the 
mRNA). 

An "origin of replication" refers to those DNA sequences 
that participate in DNA synthesis. 

A DNA "coding sequence" is a double-stranded DNA 
sequence which is transcribed and translated into a polypep- 
tide in vivo when placed under the control of appropriate 
regulatory sequences. The boundaries of the coding 
sequence are determined by a start codon at the 5' (amino) 
terminus and a translation stop codon at the 3* (carboxyl) 
terminus. A coding sequence can include, but is not limited 
to, prokaryotic sequences, cDNA from eukaryotic mRNA, 
genomic DNA sequences from eukaryotic (e.g., 
mammalian) DNA, and even synthetic DNA sequences. A 
polyadenylation signal and transcription termination 
sequence will usually be located 3' to the coding sequence. 

Transcriptional and translational control sequences are 
DNA regulatory sequences, such as promoters, enhancers, 
polyadenylation signals, terminators, and the like, that pro- 
vide for the expression of a coding sequence in a host cell. 

A "promoter sequence" is a DNA regulatory region 
capable of binding RNA polymerase in a cell and initiating 
transcription of a downstream (3* direction) coding 
sequence. For purposes of defining the present invention, the 
promoter sequence is bounded at its 3' terminus by the 
transcription initiation site and extends upstream (5' 
direction) to include the minimum number of bases or 
elements necessary to initiate transcription at levels detect- 
able above background. Within the promoter sequence will 
be found a transcription initiation site, as well as protein 
binding domains (consensus sequences) responsible for the 
binding of RNA polymerase. Eukaryotic promoters often, 
but not always, contain "TATA" boxes and "CAT" boxes. 
Prokaryotic promoters contain Shine-Dalgamo sequences in 
addition to the -10 and -35 consensus sequences. 

An "expression control sequence" is a DNA sequence that 
controls and regulates the transcription and translation of 
another DNA sequence. A coding sequence is "under the 
control" of transcriptional and translational control 
sequences in a cell when RNA polymerase transcribes the 
coding sequence into mRNA, which is then translated into 
the protein encoded by the coding sequence, 

A "signal sequence" can be included near the coding 
sequence. This sequence encodes a signal peptide, 
N-terminal to the polypeptide, that communicates to the host 
cell to direct the polypeptide to the cell surface or secrete the 
polypeptide into the media, and this signal peptide is clipped 
off by the host cell before the protein leaves the cell. Signal 
sequences can be found associated with a variety of proteins 
native to prokaryotes and eukaryotes. 

The term "oligonucleotide", as used herein in referring to 
the probe of the present invention, is defined as a molecule 
comprised of two or more ribonucleotides, preferably more 
than three. Its exact size will depend upon many factors 
which, in turn, depend upon the ultimate hinction and use of 
the oligonucleotide. 



5,972,616 



The lerm "primer" as used herein refers to an 
oligonucleotide, whether occurring naturally as in a purified 
restriction digest or produced synthetically, which is capable 
of acting as a point of initiation of synthesis when placed 
under conditions in which synthesis of a primer extension 5 
product, which is complementary to a nucleic acid strand, is 
induced, i.e., in the presence of nucleotides and an inducing 
agent such as a DNA polymerase and at a suitable tempera- 
ture and pH. The primer may be either single -stranded or 
double-stranded and must be sufiBcienlly long to prime the 
synthesis of the desired extension product in the presence of 
the inducing agent. The exact length of the primer will 
depend upon many factors, including temperature, source of 
primer and use the method. For example, for diagnostic 
applications, depending on the complexity of the target 
sequence, the oligonucleotide primer typically contains 
15—25 or more nucleotides, although it may contain fewer 
nucleotides. 

The primers herein are selected to be "substantially" 
complementary to different strands of a particular target 
DNA sequence. This means that the primers must be suflB- 
cienlly complementary to hybridize with their respective 
strands. Therefore, the primer sequence need not reflect the 
exact sequence of the template. For example, a non- 
complementary nucleotide fragment may be attached to the ^5 
5' end of the primer, with the remainder of the primer 
sequence being complementary to the strand. Alternatively, 
non-complementary bases or longer sequences can be inter- 
spersed into the primer, provided that the primer sequence 
has suflBcient complementary with the sequence or hybridize 
therewith and thereby form the template for the synthesis of 
the extension product. 

As used herein, the terms "restriction endonucleases" and 
"restriction enzymes" refer to enzymes, each of which cut 
double-stranded DNA at or near a specific nucleotide 35 
sequence. 

A cell has been "transformed" by exogenous or heterolo- 
gous DNA when such DNA has been introduced inside the 
cell. The transforming DNA may or may not be integrated 
(covalently linked) into the genome of the cell. In 40 
prokaryotes, yeast, and mammalian cells for example, the 
transforming DNA may be maintained on an episomal 
element such as a plasmid. With respect to eukaryotic cells, 
a stably transformed cell is one in which the transforming 
DNA has become integrated into a chromosome so that it is 45 
inherited by daughter cells through chromosome replication. 
This stability is demonstrated by the abifity of the eukaryotic 
cell to establish cell lines or clones comprised of a popula- 
tion of daughter cells containing the transforming DNA. A 
"clone" is a population of cells derived from a single cell or 50 
ancestor by mitosis. A "cell line" is a clone of a primary cell 
that is capable of stable growth in vitro for many genera- 
tions. 

Two DNA sequences are "substantially homologous" 
when at least about 75% (preferably at least about 80%, and 55 
most preferably at least about 90% or 95%) of the nucle- 
otides match over the defined length of the DNA sequences. 
Sequences that are substantially homologous can be identi- 
fied by comparing the sequences using standard software 
available in sequence data banks, or in a Southern hybrid- eo 
ization experiment under, for example, stringent conditions 
as defined for that particular system. Defining appropriate 
hybridization conditions is within the skill of the art. See, 
e.g., Maniatis et al., supra; DNA Cloning, Vols, I & II, supra; 
Nucleic Acid Hybridization, supra. 65 

A "heterologous" region of the DNA construct is an 
identifiable segment of DNA within a larger DNA molecule 



that is not found in association with the larger molecule in 
natm"e. Thus, when the heterologous region encodes a mam- 
malian gene, the gene will usually be flanked by DNA that 
does not flank the mammalian genomic DNA in the genome 
of the source organism. In another example, coding 
sequence is a construct where the coding sequence itself is 
not found in nature (e.g., a cDNA where the genomic coding 
sequence contains introns, or synthetic sequences having 
codons different than the native gene). Allelic variations or 
naturally -occurring mutational events do not give rise to a 
heterologous region of DNA as defined herein. 

The labels most commonly employed for these studies are 
radioactive elements, enzymes, chemicals which fluoresce 
when exposed to ultraviolet light, and others. A number of 
fluorescent materials are known and can be utilized as labels. 
These include, for example, fluorescein, rhodamine, 
auramine, Texas Red, AMCA blue and Lucifer Yellow. A 
particular detecting material is anti-rabbit antibody prepared 
in goats and conjugated with fluorescein through an isothio- 
cyanate. 

Proteins can also be labeled with a radioactive element or 
with an enzyme. The radioactive label can be detected by 
any of the currently available counting procedures. The 
preferred isotope may be selected from H, C, P, S, 
^<^C1, ^^Cr, ^^Co, ^«Co, ^^Fe, ^Y, ^^^I, ^^M, and '^^Rc. 

Enzyme labels are likewise useful, and can be detected by 
any of the presently utilized color ira etric, 
spectrophotometric, fluorospectrophotometric, amperomet- 
ric or gasometric techniques. The enzyme is conjugated to 
the selected particle by reaction with bridging molecules 
such as carbodiimides, diisocyanates, glularaldehyde and 
the like. Many enzymes which can be used in these proce- 
dures are known and can be utilized. The preferred are 
peroxidase, p-glucuronidase, p-D-glucosidase, P-D- 
galactosidase. urease, glucose oxidase plus peroxidase and 
alkaline phosphatase. U.S. Pat. Nos. 3,654,090, 3,850,752, 
and 4,016,043 are referred to by way of example for their 
disclosure of alternate labeling material and methods. 

A particular assay system developed and utilized in the art 
is known as a receptor assay. In a receptor assay, the material 
to be assayed is appropriately labeled and then certain 
cellular test colonies are inoculated with a quantitiy of both 
the label after which binding studies are conducted to 
determine the extent to which the labeled material binds to 
the cell receptors. In this way, differences in affinity between 
materials can be ascertained. 

An assay useful in the art is known as a "cis/trans" assay. 
Briefly, this assay employs two genetic constructs, one of 
which is typically a plasmid that continually expresses a 
particular receptor of interest when transfected into an 
appropriate cell line, and the second of which is a plasmid 
that expresses a reporter such as luciferase, under the control 
of a receptor/ligand complex. Thus, for example, if it is 
desired to evaluate a compound as a hgand for a particular 
receptor, one of the plasmids would be a construct that 
results in expression of the receptor in the chosen cell line, 
while the second plasmid would possess a promoter linked 
to the luciferase gene in which the response element to the 
particular receptor is inserted. If the compound under test is 
an agonist for the receptor, the ligand will complex with the 
receptor, and the resulting complex will bind the response 
element and initiate transcription of the luciferase gene. The 
resulting chemiluminescence is then measured 
photometrically, and dose response curves are obtained and 
compared to those of known ligands. The foregoing protocol 
is described in detail in U.S. Pat. No. 4,981,784. 



5,972,616 



8 



25 



30 



As used herein, the term "host" is meant to include not 
only prokaryotes but also eukaryotes such as yeast, plant and 
animal cells. A recombinant DNA molecule or gene which 
encodes a human TADG-15 protein of the present invention 
can be used to transform a host using any of the techniques 5 
commonly known to those of ordinary skill in the art. 
Especially preferred is the use of a vector containing coding 
sequences for the gene which encodes a human TADG-15 
protein of the present invention for purposes of prokaryote 
transformation. Prokaryotic hosts may include E. coli, 5. lO 
tymphimuriuin, Serratia marcescens and Bacillus subtilis. 
Eukaryotic hosts include yeasts such as Pichia pastoris, 
mammalian cells and insect cells. 

In general, expression vectors containing promoter 
sequences which facilitate the efiBcient transcription of the 15 
inserted DNA fragment are used in connection with the host. 
The expression vector typically contains an origin of 
replication, promoter(s), terminator(s), as well as specific 
genes which are capable of providing phenotypic selection 
in transformed cells. The transformed hosts can be fer- 20 
mented and cultured according to means known in the art to 
achieve optimal cell growth. 

The invention includes a substantially pure DNA encod- 
ing a TADG-15 protein, a strand of which DNA will 
hybridize at high stringency to a probe containing a 
sequence of at least 15 consecutive nucleotides of (SEQ ID 
NO: 1). The protein encoded by the DNA of this invention 
may share at least 80% sequence identity (preferably 85%, 
more preferably 90%, and most preferably 95%) with the 
amino acids listed in FIG. 10 (SEQ ID NO:2). More 
preferably, the DNA includes the coding sequence of the 
nucleotides of FIG. 9 (SEQ ID NO: 1), or a degenerate 
variant of such a sequence. 

The probe to which the DNA of the invention hybridizes 
preferably consists of a sequence of at least 20 consecutive 
nucleotides, more preferably 40 nucleotides, even more 
preferably 50 nucleotides, and most preferably 100 nucle- 
otides or more (up to 100%) of the coding sequence of the 
nucleotides listed in FIG. 9 (SEQ ID NO:l) or the comple- 
ment thereof. Such a probe is useful for detecting expression 
of TADG-15 in a human cell by a method including the steps 
of (a) contacting mRNA obtained from the cell with the 
labeled hybridization probe; and (b) detecting hybridization 
of the probe with the mRNA. 

This invention also includes a substantially pure DNA 
containing a sequence of at least 15 consecutive nucleotides 
(preferably 20, more preferably 30, even more preferably 50, 
and most preferably all) of the region from nucleotides 1 to 
3147 of the nucleotides listed in FIG. 9 (SEQ ID NOil). 50 

By "high stringency" is meant DNA hybridization and 
wash conditions characterized by high temperature and low 
salt concentration, e.g., wash conditions of 65** C. at a salt 
concentration of approximately O.lxSSC, or the functional 
equivalent thereof. For example, high stringency conditions 55 
may include hybridization at about 42° C. in the presence of 
about 50% formamide; a first wash at about 65** C. with 
about 2xSSC containing 1% SDS; followed by a second 
wash at about 65** C. with about O.lxSSC. 

By "substantially pure DNA" is meant DNA that is not 60 
part of a milieu in which the DNA naturally occurs, by virtue 
of separation (partial or total purification) of some or all of 
the molecules of that milieu, or by virtue of alteration of 
sequences that flank the claimed DNA. The term therefore 
includes, for example, a recombinant DNA which is incor- 65 
porated into a vector, into an autonomously replicating 
plasmid or virus, or into the genomic DNA of a prokaryote 



or eukaryote; or which exists as a separate molecule (e.g., a 
cDNA or a genomic or cDNA fragment produced by poly- 
merase chain reaction (PGR) or restriction endonuclease 
digestion) independent of other sequences. It also includes a 
recombinant DNA which is part of a hybrid gene encoding 
additional polypeptide sequence, e.g., a fusion protein. Also 
included is a recombinant DNA which includes a portion of 
the nucleotides listed in FIG. 9 (SEQ ID NO: 1) which 
encodes an alternative splice variant of TADG-15. 

The DNA may have at least about 70% sequence identity 
to the coding sequence of the nucleotides listed in FIG. 9 
(SEQ ID NO:l), preferably at least 75% (e.g. at least 80%); 
and most preferably at least 90%. The identity between two 
sequences is a direct function of the number of matching or 
identical positions. When a subunit position in both of the 
two sequences is occupied by the same monomeric subunit, 
e.g., if a given position is occupied by an adenine in each of 
two DNA molecules, then they are identical at that position. 
For example, if 7 positions in a sequence nucleotides in 
length are identical to the corresponding positions in a 
second 10-nucleotide sequence, then the two sequences have 
70% sequence identity. The length of comparison sequences 
will generally be at least 50 nucleotides, preferably at least 
60 nucleotides, more preferably at least 75 nucleotides, and 
most preferably 100 nucleotides. Sequence identity is typi- 
cally measured using sequence analysis software (e.g.. 
Sequence Analysis Software Package of the Genetics Com- 
puter Group, University of Wisconsin Biotechnology 
Center, 1710 University Avenue, Madison, Wis. 53705). 

The present invention comprises a vector comprising a 
DNA sequence which encodes a human TADG-15 protein 
and said vector is capable of replication in a host which 
comprises, in operable linkage: a) an origin of replication; b) 
a promoter; and c) a DNA sequence coding for said protein. 
Preferably, the vector of the present invention contains a 
portion of the DNA sequence shown in SEQ ID No:l. A 
"vector" may be defined as a replicable nucleic acid 
construct, e.g., a plasmid or viral nucleic acid. Vectors may 
be used to amplify and/or express nucleic acid encoding 
TADG-15 protein. An expression vector is a replicable 
construct in which a nucleic acid sequence encoding a 
polypeptide is operably linked to suitable control sequences 
capable of effecting expression of the polypeptide in a cell. 
TTie need for such control sequences will vary depending 
upon the cell selected and the transformation method cho- 
sen. 

Generally, control sequences include a transcriptional 
promoter and/or enhancer, suitable mRNA ribosomal bind- 
ing sites, and sequences which control the termination of 
transcription and translation. Methods which are well known 
to those skilled in the art can be used to construct expression 
vectors containing appropriate transcriptional and transla- 
tional control signals. See for example, the techniques 
described in Sambrook et al., 1989, Molecular Cloning: A 
Laboratory Manual (2nd Ed.), Cold Spring Harbor Press, 
N.Y. A gene and its transcription control sequences are 
defined as being "operably linked" if the transcription con- 
trol sequences effectively control the transcription of the 
gene. Vectors of the invention include, but are not limited to, 
plasmid vectors and viral vectors. Preferred viral vectors of 
the invention are those derived from retroviruses, 
adenovirus, ade no- associated virus, SV40 virus, or herpes 
viruses. 

By a "substantially pure protein" is meant a protein which 
has been separated from at least some of those components 
which naturally accompany it. Typically, the protein is 
substantially pure when it is at least 60%, by weight, free 



5,972,616 



10 



from the proteins and other naturally-occurring organic 
molecules with which il is naturally associated in vivo. 
Preferably, the purity of the preparation is at least 75%, more 
preferably at least 90%, and most preferably at least 99%, by 
weight. A substantially pure TADG-15 protein may be 
obtained, for example, by extraction from a natural source; 
by expression of a recombinant nucleic acid encoding an 
TADG-15 polypeptide; or by chemically synthesizing the 
protein. Purity can be measured by any appropriate method, 
e.g., column chromatography such as immunoaflGnity chro- 
matography using an antibody specLGc for TADG-15, poly- 
acrylamide gel electrophoresis, or HPLC analysis. A protein 
is substantially free of naturally associated components 
when it is separated from at least some of those contami- 
nants which accompany it in its natural state. Thus, a protein 
which is chemically synthesized or produced in a cellular 
system different from the cell from which it naturally 
originates will be, by definition, substantially free from its 
naturally associated components. Accordingly, substantially 
pure proteins include eukaryotic proteins synthesized in E. 
coUj other prokaryotes, or any other organism in which they 
do not naturally occur. 

In addition to substantially full-length proteins, the inven- 
tion also includes fragments (e.g., antigenic fragments) of 
the TADG-15 protein (SEQ ID No:2). As used herein, 
"fragment," as applied to a polypeptide, will ordinarily be at 
least 10 residues, more typically at least 20 residues, and 
preferably at least 30 (e.g., 50) residues in length, but less 
than the entire, intact sequence. Fragments of the TADG-15 
protein can be generated by methods known to those skilled 
in the art, e.g., by enzymatic digestion of naturally occurring 
or recombinant TADG-15 protein, by recombinant DNA 
techniques using an expression vector that encodes a defined 
fragment of TADG-15, or by chemical synthesis. The ability 
of a candidate fragment to exhibit a characteristic of TADG- 
15 (e.g., binding to an antibody specific for TADG-15) can 
be assessed by methods described herein. Purified TADG-15 
or antigenic fragments of TADG-15 can be used to generate 
new antibodies or to test existing antibodies (e.g., as positive 
controls in a diagnostic assay) by employing standard pro- 
tocols known to those skilled in the art. Included in this 
invention are polyclonal antisera generated by using TADG- 
15 or a fragment of TADG-15 as the immunogen in, e.g., 
rabbits. Standard protocols for monoclonal and polyclonal 
antibody production known to those skilled in this art are 
employed. The monoclonal antibodies generated by this 
procedure can be screened for the ability to identify recom- 
binant TADG-15 cDNA clones, and to distinguish them 
from known cDNA clones. 

Further included in this invention are TADG-15 proteins 
which are encoded at least in part by portions of SEQ ID 
NO:2, e.g., products of alternative mRNA splicing or alter- 
native protein processing events, or in which a section of 
TADG-15 sequence has been deleted. The fragment, or the 
intact TADG-15 polypeptide, may be covalently linked to 
another polypeptide, e.g. which acts as a label, a ligand or a 
means to increase antigenicity. 

The invention also includes a polyclonal or monoclonal 
antibody which specifically binds to TADG-15. The inven- 
tion encompasses not only an intact monoclonal antibody, 
but also an immunologically-active antibody fragment, e.g., 
a Fab or (Fab)2 fragment; an engineered single chain Fv 
molecule; or a chimeric molecule, e.g., an antibody which 
contains the binding specificity of one antibody, e.g., of 
murine origin, and the remaining portions of another 
antibody, e.g., of human origin. 

In one embodiment, the antibody, or a fragment thereof, 
may be linked to a toxin or to a detectable label, e.g. a 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



radioactive label, non-radioactive isotopic label, fluorescent 
label, chemiluminescent label, paramagnetic label, enzyme 
label, or colorimetric label. Examples of suitable toxins 
include diphtheria toxin, Pseudomonas exotoxin A, ricin, 
and cholera toxin. Examples of suitable enzyme labels 
include malate hydrogenase, staphylococcal nuclease, delta- 
5-steroid isomerase, alcohol dehydrogenase, alpha-glycerol 
phosphate dehydrogenase, triose phosphate isomerase, 
peroxidase, alkaline phosphatase, asparaginase, glucose 
oxidase, beta-galactosidase, ribonuclease, urease, catalase, 
glucose-6-phosphate dehydrogenase, glucoamylase, 
acetylcholinesterase, etc. Examples of suitable radioisotopic 
labels include ^H, '"l, '^M, ^=^P, ^^S, ''^C, etc. 

Paramagnetic isotopes for purposes of in vivo diagnosis 
can also be used according to the methods of this invention. 
There are numerous examples of elements that are useful in 
magnetic resonance imaging. For discussions on in vivo 
nuclear magnetic resonance imaging, see, for example, 
Schaefer et al., (1989) JACC 14, 472-^0; Shreve et al., 
(1986) Magn, Reson, Med, 3, 336-340; Wolf, G. L., (1984) 
Physiol Chem. Phys. Med. NMR 16, 93-95; Wesbey et al., 
(1984) Physiol Chem. Phys. Med. NMR 16, 145-155; 
Runge et al., (1984) Invest. Radiol 19, 408^15. Examples 
of suitable fluorescent labels include a fluorescein label, an 
isothiocyalate label, a rhodamine label, a phycoerythrin 
label, a phycocyanin label, an allophycocyanin label, an 
ophthaldehyde label, a fluorescamine label, etc. Examples of 
chemiluminescent labels include a luminal label, an isolu- 
minal label, an aromatic acridinium ester label, an imidazole 
label, an acridinium salt label, an oxalate ester label, a 
luciferin label, a luciferase label, an aequorin label, etc. 

Those of ordinary skill in the art will know of other 
suitable labels which may be employed in accordance with 
the present invention. The binding of these labels to anti- 
bodies or fragments thereof can be accomplished using 
standard techniques commonly known to those of ordinary 
skill in the art. Typical techniques are described by Kennedy 
et al, (1976) Clin. Chim. Acta 70, 1-31; and Schurs et al., 
(1977) Clin. Chim. Acta 81, 1-40. Coupling techniques 
mentioned in the latter are the glutaraldehyde method, the 
periodate method, the dimaleimide method, the 
m-maleimidobenzyl-N-hydroxy-succinimide ester method. 
All of these methods are incorporated by reference herein. 

Also within the invention is a method of detecting TADG- 
15 protein in a biological sample, which includes the steps 
of contacting the sample with the labeled antibody, e.g., 
radioactively tagged antibody specific for TADG-15, and 
determining whether the antibody binds to a component of 
the sample. 

As described herein, the invention provides a number of 
diagnostic advantages and uses. For example, the TADG-15 
protein is useful in diagnosing cancer in diflferenl tissues 
since this protein is highly overexpressed in tumor cells. 
Antibodies (or antigen-binding fragments thereof) which 
bind to an epitope specific for TADG-15, are useful in a 
method of detecting TADG-15 protein in a biological sample 
for diagnosis of cancerous or neoplastic transformation. This 
method includes the steps of obtaining a biological sample 
(e.g., cells, blood, plasma, tissue, etc.) from a patient sus- 
pected of having cancer, contacting the sample with a 
labeled antibody (e.g., radioactively tagged antibody) spe- 
cific for TADG-15, and detecting the TADG-15 protein 
using standard immunoassay techniques such as an ELISA. 
Antibody binding to the biological sample indicates that the 
sample contains a component which specifically binds to an 
epitope within TADG-15. 

Likewise, a standard Northern blot assay can be used to 
ascertain the relative amounts of TADG-15 mRNA in a cell 



5,972,616 



11 



12 



or tissue obtained from a patient suspected of having cancer, 
in accordance with conventional Northern hybridization 
techniques known to those of ordinary skill in the art. This 
Northern assay uses a hybridization probe, e.g. radiolabelled 
TADG-15 cDNA, either containing the full-length, single 5 
stranded DNA having a sequence complementary to SEQ ID 
NO:l (FIG. 9), or a fragment of that DNA sequence at least 
20 (preferably at least 30, more preferably at least 50, and 
most preferably at least 100 consecutive nucleotides in 
length). The DNA hybridization probe can be labeled by any lo 
of the many different methods known lo those skilled in this 
art. 

Antibodies to the TADG-15 protein can be used in an 
immunoassay to delect increased levels of TADG-15 protein 
expression in tissues suspected of neoplastic transformation, 
These same uses can be achieved with Northern blot assays 
and analyses. 

The present invention is directed to DNA encoding a 
TADG-15 protein selected from the group consisting of: (a) 
isolated DNA which encodes a TADG-15 protein; (b) iso- 
lated DNA which hybridizes to isolated DNA of (a) above 
and which encodes a TADG-15 protein; and (c) isolated 
DNA differing from the isolated DNAs of (a) and (b) above 
in codon sequence due to the degeneracy of the genetic code, 
and which encodes a TADG-15 protein. Preferably, the DNA 
has the sequence shown in SEQ ID No:l. More preferably, 
the DNA encodes a TADG-15 protein having the amino acid 
sequence shown in SEQ ID No: 2. 

The present invention is also directed to a vector capable 
of expressing the DNA of the present invention adapted for 
expression in a recombinant cell and regulatory elements 
necessary for expression of the DNA in the cell. Preferably, 
the vector contains DNA encoding a TADG-15 protein 
having the amino acid sequence shown in SEQ ID No: 2. 

The present invention is also directed to a host cell 
transfected with the vector described herein, said vector 
expressing a TADG-15 protein. Representative host cells 
include consisting of bacterial cells, mammalian cells and 
insect cells. 40 

The present invention is also directed to a isolated and 
purified TADG-15 protein coded for by DNA selected from 
the group consisting of: (a) isolated DNA which encodes a 
TADG-15 protein; (b) isolated DNA which hybridizes to 
isolated DNA of (a) above and which encodes a TADG-15 45 
protein; and (c) isolated DNA differing from the isolated 
DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG- 
15 protein. Preferably, the isolated and purified TADG-15 
protein of claim 9 having the amino acid sequence shown in so 
SEQ ID No:2. 

The present invention is also directed to a method of 
detecting expression of the protein of claim 1, comprising 
the steps of: (a) contacting mRNA obtained from the cell 
with the labeled hybridization probe; and (b) detecting 55 
hybridization of the probe with the mRNA. 

The following examples are given for the purpose of 
illustrating various embodiments of the invention and are 
not meant to limit the present invention in any fashion. 

60 

EXAMPLE 1 
Tissue collection and storage 

Upon patient hysterectomy, bilateral 
salpingooophorectomy, or surgical removal of neoplastic 
tissue, the specimen is retrieved and placed it on ice. The 65 
specimen was then taken to the resident pathologist for 
isolation and identification of specific tissue samples. 



Finally, the sample was frozen in liquid nitrogen, logged into 
the laboratory record and stored at -80** C. Additional 
specimens were frequently obtained from the Cooperative 
Human Tissue Network (CHTN). These samples were pre- 
pared by the CHTN and shipped to us on dry ice. Upon 
arrival, these specimens were logged into the laboratory 
record and stored at -80° C. 

EXAMPLE 2 
mRNA isolation and cDNA synthesis 

Forty-one ovarian tumors (10 low malignant potential 
tumors and 31 carcinomas) and 10 normal ovaries were 
obtained from surgical specimens and frozen in liquid nitro- 
gen. The human ovarian carcinoma cell lines SW 626 and 
Caov 3, the human breast carcinoma cell lines MDA-MB- 
231 and MDA-MB-435S, and the human uterine cervical 
carcinoma cell line Hela were purchased from the American 
Type Culture Collection (Rockville, Md.). Cells were cul- 
tured to subconfluency in Dulbecco*s modified Eagle's 
medium, suspended with 10% (v/v) fetal bovine serum and 
antibiotics. 

Messenger RNA (mRNA) isolation was performed 
according to the manufacturer's instructions using the Mini 
RiboSep'^'*^ Ultra mRNA isolation kit purchased from Bec- 
ton Dickinson (cat. #30034). This was an oligo(dt) chroma- 
tography based system of mRNA isolation. The amount of 
mRNA recovered was quantitated by UV spectrophotom- 
etry. 

First strand complementary DNA (cDNA) was synthe- 
sized using 5.0 mg of mRNA and either random hexamer or 
oligo(dT) primers according to the manufacturer's protocol 
utilizing a first strand synthesis kit obtained from Clontech 
(cat.# K1402-1). The purity of the cDNA was evaluated by 
PCR using primers specific for the p53 gene. These primers 
span an intron such that pure cDNA can be distinguished 
from cDNA that is contaminated with genomic DNA. 

EXAMPLE 3 

PCR reactions 

The mRNA overexpression of TADG-15 was determined 
using a quantitative PCR. Oligonucleotide primers were 
used for: TADG-15, forward 

5'-ATGACAGAGGATTCAGGTAC-3' and reverse 
5'-GAAGGTGAAGTCATTGAAGA-3'; and P-tubuHn, for- 
ward 5'-TGCATTGACAACGAGGC-3' and reverse 
5'-CTGTCTTGACATTGTTG-3'. P-tubulin was utihzed as 
an internal control. Reactions were carried out as follows: 
first strand cDNA generated from 50 ng of mRNA will be 
used as template in the presence of 1.0 mM MgCl2, 0,2 mM 
dNTPs, 0.025 U Taq polymerase/ml of reaction, and 
Ixbuffer supplied with enzyme. In addition, primers must be 
added to the PCR reaction. Degenerate primers which may 
amplify a variety of cDNAs are used at a final concentration 
of 2.0 mM each, whereas primers which amplify specific 
cDNAs are added to a final concentration of 0.2 mM each. 

After initial denaiuration at 95° C. for 3 minutes, thirty 
cycles of PCR are carried out in a Perkin Elmer Gene Amp 
2400 thermal cycler. Each cycle consists of 30 seconds of 
denaturation at 95** C, 30 seconds of primer anneahng at the 
appropriate annealing temperature, and 30 seconds of exten- 
sion at 72** C. The final cycle will be extended at 72** C. for 
7 minutes. To ensure that the reaction succeeded, a fraction 
of the mixture will be electrophoresed through a 2% 
agarose/TAE gel stained with ethidium bromide (final con- 
centration 1 mg/ml). The annealing temperature varies 
according to the primers that are used in the PCR reaction. 
For the reactions involving degenerate primers, an annealing 
temperature of 48" C. were used. TTie appropriate annealing 
temperature for the TADG-15 and p-tubulin specific primers 
is 62** C. 



5,972,616 



13 

EXAMPLE 4 
T-vector ligation and transformations 

The purified PGR products are ligated into the Promega 
T-vector plasmid and the ligation products are used to 
transform JM109 competent cells according to the manu- 5 
facturer's instructions (Promega cat. #A3610). Positive 
colonies were cultured for amplification, the plasmid DNA 
isolated by means of the Wizard™ Minipreps DNA purifi- 
cation system (Promega cat #A7500), and the plasmids were 
digested with Apal and Sad restriction enzymes to deter- 10 
mine the size of the insert. Plasmids with inserts of the 
size(s) visualized by the previously described PGR product 
gel electrophoresis were sequenced. 

EXAMPLE 5 15 

DNA sequencing 

Utilizing a plasmid specific primer near the cloning site, 
sequencing reactions were carried out using PRISM''"" 
Ready Reaction Dye Deoxy™ terminators (Applied Biosys- 
tems cat# 401384) according to the manufacturer's instruc- 20 
tions. Residual dye terminators were removed from the 
completed sequencing reaction using a Gentri-sep™ spin 
column (Princeton Separation cat,#GS-901). An Applied 
Biosystems Model 373A DNA Sequencing System was 
available and was used for sequence analysis. Based upon 25 
the determined sequence, primers that specifically amplify 
the gene of interest were designed and synthesized. 

EXAMPLE 6 

Northern blot analysis 30 

10 /ig mRNAs were size separated by electrophoresis 
through a 1% formaldehyde-agarose gel in 0.02 M MOPS, 
0.05 M sodium acetate (pH 7.0), and 0.001 M EDTA The 
mRNAs were then blotted to Hybond-N (Amersham) by 
capillary action in 20xSSPE. The RNAs are fixed to the 35 
membrane by baking for 2 hours at 80° G. 

Additional multiple tissue northern (MTN) blots were 
purchased from GLONTECH Laboratories, Inc. These blots 
include the Human MTN blot (cat.#7760-l), the Human 
MTN II blot (cat.#7759-l), the Human Fetal MTN II blot 40 
(cat.#7756-l), and the Human Brain MTN III blot 
(cat. #7750-1). The appropriate probes were radiolabelled 
utilizing the Prime-a-Gene Labeling System available from 
Promega (cat#U1100). The blots were probed and stripped 
according to the ExpressHyb Hybridization Solution pro to- 45 
col available from GLONTEGH (cat.#8015-l or 8015-2). 

EXAMPLE 7 

Quantitative PGR 

Quantitative-PGR was performed in a reaction mixture 
consisting of cDNA derived from 50 ng of mRNA, 5 pmol 
of sense and antisense primers for TADG-15 and the internal 
control p-tubulin, 0.2 mmol of dNTPs, 0.5 mGi of [a-^^P] 
dGTP, and 0.625 U of Taq polymerase in Ixbuffer in a final 
volume of 25 ml. TThis mixture was subjected to 1 minute of 
denaturation at 95° G. followed by 30 cycles of denaturation 
for 30 seconds at 95° G., 30 seconds of annealing at 62** G., 
and 1 minute of extension at 72° G, with an additional 7 
minutes of extension on the last cycle. The product was 
electrophoresed through a 2% agarose gel for separation, the 
gel was dried under vacuum and autoradiographed. The 
relative radioactivity of each band was determined by Phos- 
pholmager from Molecular Dynamics. 



14 



EXAMPLE 8 

The present invention describes the use of primers 
directed to conserved areas of the serine protease family to 



65 



identify members of that family which are overexpressed in 
carcinoma. Several genes were identified and cloned in other 
tissues, but not previously associated with ovarian carci- 
noma. The present invention describes a protease identified 
in ovarian carcinoma. This gene was identified using primers 
to the conserved area surrounding the catalytic domain of 
the conserved amino acid histidine and the downstream 
conserved amino acid serine which lies approximately 150 
amino acids towards the carboxyl end of the protease. 

The gene encoding the novel extracellular serine protease 
of the present invention was identified from a group of 
proteases overexpressed in carcinoma by subcloning and 
sequencing the appropriate PGR products. An example of 
such a PGR reaction is given in FIG. 1. Subcloning and 
sequencing of individual bands from such an amplification 
provided a basis for identifying the protease of the present 
invention. 

EXAMPLE 9 

The sequence determined for the catalytic domain of 
TADG-15 is presented in FIG. 2 and is consistent with other 
serine proteases and specifically contains conserved amino 
acids appropriate for the catalytic domain of the trypsin-like 
serine protease famQy. Specific primers (20mers) derived 
from this sequence were used. 

A series of normal and tumor cDNAs were examined to 
determine the expression of the TADG-15 gene in ovarian 
carcinoma. In a series of normal derived cDNA compared to 
carcinoma derived cDNA using p-tubulin as an internal 
control for PGR amplification, TADG-15 was significantly 
overexpressed in all of the carcinomas examined and either 
was not detected or was detected at a very low level in 
normal epithelial tissue (FIG. 3). This evaluation was 
extended to a standard panel of about 40 tumors. Using these 
specific primers, the expression of this gene was also exam- 
ined in tumor cell lines derived from both ovarian and breast 
carcinoma tissues as shown in FIG. 5 and in other tumor 
tissues as shown in FIG. 6. The expression of TADG-15 was 
also observed in carcinomas of the breast, colon, prostate 
and lung. 

Using the specific sequence for TADG-15 covering the 
full domain of the catalytic site as a probe for Northern blot 
analysis, three Northern blots were examined: one derived 
from ovarian tissues, both normal and carcinoma; one from 
fetal tissues; and one from adult normal tissues. As shown in 
FIG. 7, TADG-15 transcripts were noted in all ovarian 
carcinomas, but were not present in detectable levels in any 
of the following tissues: a) normal ovary, b) fetal liver and 
brain, c) adult spleen, thymus, testes, overy and peripheral 
blood lymphocytes, d) skeletal muscle, liver, brain or heart. 
The transcript size was found to be approximately 3.2 kb. 
The hybridization for the fetal and adult blots was appro- 
priate and done with the same probe as with the ovarian 
tissue. Subsequent to this examination, it was confirmed that 
these blots contained other detectable mRNA transcripts 

Initially using the catalytic domain of the protease to 
probe Hela cDNA and ovarian tumor cDNA libraries, one 
clone was obtained covering the entire 3' end of the TADG- 
15 gene from the ovarian tumor library. On fiirther screening 
using the 5' end of the newly detected clones, two more 
clones were identified covering the 5' end of the TADG-15 
gene from the Hela library (FIG. 8). The complete nucle- 
otide sequence (SEQ ID No: 1) is provided in FIG. 9 along 
with translation of the open reading frame (SEQ ID No: 2). 

In the nucleotide sequence, there is a Kozak sequence 
typical of sequences upstream from the initiation site of 



5,9' 

15 

iraoslalion. There is also a poly-adenylation signal sequence 
and a polyadenylated lail. The open reading frame consists 
of a 855 amino acid sequence (SEQ ID No: 2) which includes 
an amino terminal cytoplasmic tail from amino acids 1—50, 
an approximately 22 amino acid transmembrane domain 
followed by an extracellular sequence preceding two CUB 
repeats identified from complement subcomponents Clr and 
Cls. These two repeats are followed by fotu" repeat domains 
of a class A motif of the LDL receptor and these four repeats 
are followed by the protease enzyme of the trypsin family 
constituting the carboxyl end of the TADG-15 protein (FIG. 
11). Also a clear delineation of the catalytic domain con- 
served histidine, aspartic acid, serine series along with a 
series of amino acids conserved in the serine protease family 
is indicated (FIG. 10). 

A search of GeneBank for similar previously identified 
sequences yielded one such sequence with relatively high 
homology to a portion of the TADG-15 gene. The similarity 
between the portion of TADG-15 from nucleotide #182 to 
3139 and SNC-19 GeneBank accession #U20428) is 
approximately 97% (FIG, 12). TTiere are however significant 
differences between SNC-19 and TADG-15 viz. TADG-15 
has an open reading frame of 855 amino acids whereas the 
longest ORF of SNC-19 is only 173 amino acids. SNC-19 
does not include a proper start site for the initiation of 
translation nor does it include the amino terminal portion of 
the protein encoded by TADG-15. Moreover, SNC-19 does 
not include an ORF for a functional serine protease because 
the His, Asp and Ser residues necessary for function are 
encoded in different reading frames. 

TADG-15 is a highly overexpressed gene in tumors. It is 
expressed in a limited number of normal tissues, primarily 
tissues that are involved in either uptake or secretion of 
molecules e.g. colon and pancreas. TADG-15 is further 
novel in its component structure of domains in that it has a 
protease catalytic domain which could be released and used 
as a diagnostic and which has the potential for a target for 
therapeutic intervention. TADG-15 also has ligand binding 
domains which are commonly associated with molecules 
that internalize or take-up ligands from the external surface 
of the cell as does the LDL receptor for the LDL cholesterol 
complex. There is potential that these domains may be 
involved in uptake of specific ligands and they may offer the 
potential for making delivery of toxic molecules or genes to 
tumor cells which express this molecule on their surface. It 
has features that are similar to the hepsin serine protease 
molecule in that it also has an amino-terminal transmem- 
brane domain with the proteolytic catalytic domain extended 



2,616 

16 

into the extracellular matrix. The difference here is that 
TADG-15 includes these ligand binding repeat domains 
which the hepsin gene does not have. In addition to the use 
of this gene as a diagnostic or therapeutic target in ovarian 

^ carcinoma and other carcinomas including breast, prostate, 
lung and colon, its ligand -binding domains may be valuable 
in the uptake of specific molecules into tumor cells. Table 2 
shows the number of cases with overexpression of TADG15 

30 in normal ovaries and ovarian tumors. 

Any patents or publications mentioned in this specifica- 
tion are indicative of the levels of those skilled in the art to 
15 which the invention pertains. These patents and publications 
are herein incorporated by reference to the same extent as if 
each individual publication was specifically and individually 
indicated to be incorporated by reference. 

20 

One skilled in the art will readily appreciate that the 
present invention is well adapted to carry out the objects and 
obtain the ends and advantages mentioned, as well as those 
inherent therein. The present examples along with the 
methods, procedures, treatments, molecules, and specific 
compounds described herein are presently representative of 
preferred embodiments, are exemplary, and are not intended 
as limitations on the scope of the invention. Changes therein 
3Q and other uses will occur to those skilled in the art which are 
encompassed within the spirit of the invention as defined by 
the scope of the claims. 

TABLE 2 



Number of cases with overexpression of TADG15 
in normal ovaries and ovarian tumors. 







N 


overexpression of TADG15 


expression ratio" 




Noimal 


10 


0 (0%) 


0.182 ± 0.024 


40 


LMP 


10 


10 (100%) 


0.847 ± 0.419 




serous 


6 


6 (100%) 


0.862 ± 0.419 




mucinous 


4 


4 (100%) 


0.825 ± 0.483 




Carcinoma 


31 


31 (100%) 


0.771 * 0.380 




serous 


18 


18 (100%) 


0.779 * 0.332 




mucinous 


7 


7 (100%) 


0.907 * 0.584 


45 


endometrioid 


3 


3 (100%) 


0.502 ± 0.083 




clear cell 


3 


3 (100%) 


0.672 a: 0.077 



"The ratio of expression level of TADG15 to p-tubuHn (mean ± SD) 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 13 

<210> SEQ ID NO 1 

<211> LENGTH: 3147 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<220> FEATURE: 

<222> LOCATION: 23.. 2589 

<223> OTHER INFORMATION: cDNA sequence of TAEC-IS 
<4 00> SEQUENCE: 1 

-tcaagagcgg cctcggggta ccatggggag cgatcgggcc cgcaagggcg gagggggccc 60 
gaaggacttc ggcgcgggac tcaagtacao ctcccggcac gagaaagtga atggcttgga 120 



5,972,616 

17 18 

-continued 

ggaaggcgtg gagttcctgc cagtcaacao cgtcaagaag gtggaaaagc atggcccggg 180 

gcgctgggtg gtgctggcag ccgtgctgat cggcctcctc ttggtcttgc tggggatcgg 240 

cttcctggtg tggcatttgc agtaccggga cgtgcgtgtc cagaaggtct tcaatggcta 300 

catgaggat-c acaaatgaga attttgtgga tgcctacgag aactccaact ccactgagtt 360 

-tg-taagcctg gccagcaagg tgaaggacgc gctgoagctg ctgtacagcg gagtcccatt 420 

cctgggcccc taccacaagg agtcggctgt gacggccttc agcgagggca gcgtcatcgc 480 

ctactactgg tctgagttca gcatcccgca gcacctggtg gaggaggccg agcgcgtcat 54 0 

ggccgaggag cgcgtagtca tgctgccccc gcgggcgcgc tccctgaagt cctttgtggt 600 

cacct-cagtg gtggctttcc ccacggactc caaaacagta cagaggaccc aggacaacag 660 

ctgcagcttt ggcctgcacg cccgcggtgt ggagctgatg cgcttcacca cgcccggctt 720 

ccctgacagc ccctaccccg ctcatgcccg ctgccagtgg gccctgcggg gggacgccga 780 

ctcogtgctg agcctcacct tccgcagctt tgoccttgcg tcctgcgacg agcgcggcag 84 0 

cgacctggtg acggtgtaca acaccctgag ccccatggag ccccacgccc tggtgcagtt 900 

gtgtggcacc taccctccct cctacaacct gaccttccac tcctcccago acgtcctgct 960 

catcacactg ataaccaaca ctgagcggcg gcatcccggc tttgaggcca ccttcttcca 1020 

gctgcctagg atgagcagct gtggaggccg cttacgtaaa gcccagggga cattcaacag 1080 

cccctactac ccaggccact acccacccaa cattgactgc acatggaaca ttgaggtgcc 114 0 

caacaaccag catgtgaagg tgagcttcaa attcttctac ctgctggagc ccggcgtgcc 1200 

tgcgggcacc tgccccaagg actacgtgga gatcaatggg gagaaatact gcggagagag 1260 

gtcccagttc gtcgtcacca gcaacagcaa caaga'tcaca gttcgcttcc actcagatca 1320 

gtcctacacc gacaccggct tcttagctga atacctctcc tacgactcca g-bgaccca-tg 1380 

cccggggcag ttcacgtgcc gcacggggcg gtgtatccgg aaggagctgc gctgtgatgg 144 0 

ct:gggccgac tgcaccgacc acagcgatga gc'tcaact:gc agttgcgacg ccggccacca 1500 

gttcacgtgc aagaacaagt tctgcaagcc cctcttctgg gtctgcgaca gtgtgaacga 1560 

c-tgcggagac aacagcgacg agcagggg-tg cag-ttg-tccg gcccagacct tcagg-tgt^c 1620 

caatgggaag tgcc-tc-tcga aaagccagca gtgcaatggg aaggacgact gtggggacgg 1680 

gtccgacgag gcctcctgcc ccaagg-tgaa cgtcgtcact tgtaccaaac acacctaccg 174 0 

ctgcctcaat gggctctgct tgagcaaggg caaccctgag tgtgacggga aggaggacbg 1800 

tagcgacggc tcagatgaga aggactgcga ctgtgggctg cggtcattca cgagacaggc 1860 

tcgtgttgtt gggggcacgg atgcggatga gggcgagtgg ccctggcagg taagcctgca 1920 

tgctctgggc cagggccaca tctgcggtgc ttccctcatc tctcccaact ggctggtctc 1980 

tgccgcacac -bgctacatcg atigacagagg a-k-tcagg-tac tcagacccca cgcag-kggac 2040 

ggccttcctg ggcttgcacg accagagcca gcgcagcgcc cctggggtgc aggagcgcag 2100 

gc-bcaagcgc at:ca-bc-tccc accccttctt caatgacttc acct-tcgact: atigacatcgc 2160 

gc-tgctggag c^ggagaaac cggcagag^a cagctccatg g-bgcggccca tctgcctgcc 2220 

ggacgcct:cc catgtc-ttcc ctgccggcaa ggcca-tctgg gtcacgggct ggggacacac 2280 

ccagtatgga ggcactggcg cgctgatcct gcaaaagggt gagatccgcg tcatcaacca 234 0 

gaccacctgc gagaacctcc -tgccgcagca gatcacgccg cgca-tgatig't gcgtgggc^t 2400 

cctcagcggc ggcgtggact cctgccaggg tgat-tccggg ggacccctg-t ccagcgtgga 2460 

ggcggatggg cggatcttcc aggccggtgt gg-tgagctgg ggagacggct gcgctcagag 2520 



5,972,616 

19 20 

-continued 



gaacaagcca 


ggcgtgtaca 


caaggctccc 


tctgtttcgg 


gactggatca 


aagagaacac 


2580 


tggggtatag 


gggccggggc 


cacccaaatg 


•tgtacacctg 


cggggccacc 


catcgtccac 


2640 


cccagtgtgc 


acgcctgcag 


gctggagact 


ggaccgctga 


ctgcaccagc 


gcccccagaa 


2700 


catacactgt 


gaactcaatc 


tccagggctc 


caaatctgcc 


-tagaaaacct 


ctcgcttcct 


2760 


cagcctccaa 


agtggagctg 


ggagg^agaa 


ggggaggaca 


ctggtggttc 


tactgaccca 


2820 


actgggggca 


aaggtttgaa 


gacacagccb 


cccccgccag 


ccccaagctg 


ggccgaggcg 


2880 


cgtttgtgta 


tatctgcctc 


ccctgtctgt 


aaggagcagc 


gggaacggag 


cttcggagcc 


2940 


tcctcagtga 


aggtggtggg 


gctgccggat 


ctgggctgtg 


gggccc'b'tgg 


gccacgctct 


3000 


tgaggaagcc 


caggctcgga 


ggaccctgga 


aaacagacgg 


gtctgagact 


gaaattgttt 


3060 


taccagctcc 


cagggtggac 


ttcagtgtgt 


gtatttgtg-t 


aaa-tgggtiaa 


aacaatttat 


3120 


ttctttttaa 


aaaaaaaaaa 


aaaaaaa 








3147 



<210> SEQ ID NO 2 

<21X> LENGTH: 855 

<212> TYPE: PRT 

<213> ORGANISM: Homo Bapiens 

<220> FEATURE: 

<223> OTHER INFORMATION: Amino acid sequence of TADG-15 encoded by 

nucleotides 23 to 2589 of Sequence 1 

<4 00> SEQUENCE: 2 

Met Gly Ser Asp Arg Ala Arg Lys Gly Gly Gly Gly Pro Lys Asp 

5 10 15 

Phe Gly Ala Gly Leu Lys Tyr Asn Ser Arg His Glu Lys Val Aen 

20 25 30 

Gly Leu Glu Glu Gly Val Glu Phe Leu Pro Val Aen Asn Val Lys 

35 40 45 

Lys Val Glu Lys His Gly Pro Gly Arg Trp Val Val Leu Ala Ala 

50 55 60 

Val Leu lie Gly Leu Leu Leu Val Leu Leu Gly lie Gly Phe Leu 

65 70 75 

Val Trp His Leu Gin Tyr Arg Asp Val Arg Val Gin Lys Val Phe 

80 85 9 0 

Asn Gly Tyr Met Arg lie Thr Asn Glu Asn Phe Val Asp Ala Tyr 

95 100 105 

Glu Asn Ser Asn Ser Thr Glu Phe Val Ser Leu Ala Ser Lys Val 

110 115 120 

Lys Asp Ala Leu Lys Leu Leu Tyr Ser Gly Val Pro Phe Leu Gly 

125 130 135 

Pro Tyr His Lys Glu Ser Ala Val Thr Ala Phe Ser Glu Gly Ser 

140 145 150 

Val lie Ala Tyr Tyr Trp Ser Glu Phe Ser lie Pro Gin His Leu 

155 160 165 

Val Glu Glu Ala Glu Arg Val Met Ala Glu Glu Arg Val Val Met 

170 175 180 

Leu Pro Pro Arg Ala Arg Ser Leu Lys Ser Phe Val Val Thr Ser 

185 190 195 

Val Val Ala Phe Pro Thr Asp Ser Lys Thr Val Gin Arg Thr Gin 

200 205 210 

Asp Asn Ser Cys Ser Phe Gly Leu His Ala Arg Gly Val Glu Leu 

215 220 225 



Met Arg Phe Thr Thr Pro Gly Phe Pro Asp Ser Pro Tyr Pro Ala 

230 235 240 



5,972,616 

21 22 

-continued 



His Ala Arg Cys Gin Trp Ala Leu Arg Gly Asp Ala Asp Ser Val 

245 250 255 

Leu Ser Leu Thr Phe Arg Ser Phe Asp Leu Ala Ser Cys Asp Glu 

260 265 270 

Arg Gly Ser Asp Leu Val Thr Val Tyr Asn Thr Leu Ser Pro Met 

275 280 285 

Glu Pro His Ala Leu Val Gin Leu Cys Gly Thr Tyr Pro Pro Ser 

290 295 300 

Tyr Asn Leu Thr Phe His Ser Ser Gin Asn Val Leu Leu lie Thr 

305 310 315 

Leu lie Thr Asn Thr Glu Arg Arg His Pro Gly Phe Glu Ala Thr 

320 325 330 

Phe Phe Gin Leu Pro Arg Met Ser Ser Cys Gly Gly Arg Leu Arg 

335 340 345 

Lys Ala Gin Gly Thr Phe Asn Ser Pro Tyr Tyr Pro Gly His Tyr 

350 355 360 

Pro Pro Asn lie Asp Cys Thr Trp Asn lie Glu Val Pro Asn Asn 

365 370 375 

Gin His Val Lys Val Ser Phe Lys Phe Phe Tyr Leu Leu Glu Pro 

380 385 390 

Gly Val Pro Ala Gly Thr Cys Pro Lys Asp Tyr Val Glu He Asn 

395 400 405 

Gly Glu Lys Tyr Cys Gly Glu Arg Ser Gin Phe Val Val Thr Ser 

410 415 420 

Asn Ser Asn Lys He Thr Val Arg Phe His Ser Asp Gin Ser Tyr 

425 430 435 

Thr Asp Thr Gly Phe Leu Ala Glu Tyr Leu Ser Tyr Asp Ser Ser 

440 445 450 

Asp Pro Cys Pro Gly Gin Phe Thr Cys Arg Thr Gly Arg Cys He 

455 460 465 

Arg Lys Glu Leu Arg Cys Asp Gly Trp Ala Asp Cys Thr Asp His 

470 475 480 

Ser Asp Glu Leu Asn Cys Ser Cys Asp Ala Gly His Gin Phe Thr 

485 490 495 

Cys Lys Asn Lys Phe Cys Lys Pro Leu Phe Trp Val Cys Asp Ser 

500 505 510 

Val Asn Asp Cys Gly Asp Asn Ser Asp Glu Gin Gly Cys Ser Cys 

515 520 525 

Pro Ala Gin Thr Phe Arg Cys Ser Asn Gly Lys Cys Leu Ser Lys 

530 535 540 

Ser Gin Gin Cys Asn Gly Lys Asp Asp Cys Gly Asp Gly Ser Asp 

545 550 555 

Glu Ala Ser Cys Pro Lys Val Asn Val Val Thr Cys Thr Lys His 

560 565 570 

Thr Tyr Arg Cys Leu Asn Gly Leu Cys Leu Ser Lys Gly Asn Pro 

575 580 585 

Glu Cys Asp Gly Lys Glu Asp Cys Ser Asp Gly Ser Asp Glu Lys 

590 595 600 

Asp Cys Asp Cys Gly Leu Arg Ser Phe Thr Arg Gin Ala Arg Val 

605 610 615 

Val Gly Gly Thr Asp Ala Asp Glu Gly Glu Trp Pro Trp Gin Val 

620 625 630 



Ser Leu His Ala Leu Gly Gin Gly His lie Cys Gly Ala Ser Leu 



5,972,616 

23 24 

-continued 



635 640 645 

lie Ser Pro Asn Ttp Leu Val Ser Ala Alo His Cys Tyr lie Asp 

650 655 660 

Asp Arg Gly Phe Arg Tyr Ser Asp Pro Thr Gin Trp Thr Ala Phe 

665 670 675 

Leu Gly Leu His Asp Gin Ser Gin Arg Ser Ala Pro Gly Val Gin 

680 685 690 

Glu Arg Arg Lou Lys Arg lie lie Ser His Pro Phe Phe Asn Asp 

695 700 705 

Phe Thr Phe Asp Tyr Asp lie Ala Leu Leu Glu Leu Glu Lys Pro 

710 715 720 

Ala Glu Tyr Ser Ser Met Val Arg Pro He Cys Leu Pro Asp Ala 

725 730 735 

Ser His Val Phe Pro Ala Gly Lys Ala He Trp Val Thr Gly Trp 

740 745 750 

Gly His Thr Gin Tyr Gly Gly Thr Gly Ala Leu He Leu Gin Lys 

755 760 765 

Gly Glu He Arg Val He Asn Gin Thr Thr Cys Glu Asn Leu Leu 

770 775 780 

Pro Gin Gin He Thr Pro Arg Met Met Cys Val Gly Phe Leu Ser 

785 790 795 

Gly Gly Val Asp Ser Cys Gin Gly Asp Ser Gly Gly Pro Leu Ser 

800 805 810 

Ser Val Glu Ala Asp Gly Arg He Phe Gin Ala Gly Val Val Ser 

815 820 825 

Trp Gly Asp Gly Cys Ala Gin Arg Asn Lys Pro Gly Val Tyr Thr 

830 835 840 

Arg Leu Pro Leu Phe Arg Asp Trp He Lys Glu Asn Thr Gly Val 

845 850 855 



<210> SEQ ID NO 3 
<211> LENGTH; 256 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION; Serine protease catalytic domain of hepsin 

(Heps) homologous to similar domain in TADG-15 

<4 00> SEQUENCE; 3 

Arg He Val Gly Gly Arg Asp Thr Ser Leu Gly Arg Trp Pro Trp 

5 10 15 

Gin Val Ser Leu Arg Tyr Asp Gly Ala His Leu Cys Gly Gly Ser 

20 25 30 

Leu Leu Ser Gly Asp Trp Val Leu Thr Ala Ala His Cys Phe Pro 

35 40 45 

Glu Arg Asn Arg Val Leu Ser Arg Trp Arg Val Phe Ala Gly Ala 

50 55 60 

Val Ala Gin Ala Ser Pro His Gly Leu Gin Leu Gly Val Gin Ala 

65 70 75 

Val Val Tyr His Gly Gly Tyr Leu Pro Phe Arg Asp Pro Asn Ser 

80 85 90 

Glu Glu Asn Ser Asn Asp He Ala Leu Val His Leu Ser Ser Pro 

95 100 105 

Leu Pro Leu Thr Glu Tyr He Gin Pro Val Cys Leu Pro Ala Ala 

110 115 120 



Gly Gin Ala Leu Val Asp Gly Lys He Cys Thr Val Thr Gly Trp 



5,972,616 

25 26 

-continued 



125 130 135 

Gly Asn Thr Gin Tyr Tyr Gly Gin Gin Ala Gly Val Leu Gin Glu 

140 145 150 

Ala Arg Vol Pro lie lie Ser Asn Asp Val Cys ABn Gly Ala Asp 

155 160 165 

Phe Tyr Gly Asn Gin lie Lys Pro Lys Met Phe Cys Ala Gly Tyr 

170 175 180 

Pro Glu Gly Gly lie Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro 

185 190 195 

Phe Val Cys Glu Asp Ser lie Ser Arg Thr Pro Arg Trp Arg Leu 

200 205 210 

Cys Gly lie Val Ser Trp Gly Thr Gly Cys Ala Leu Ala Gin Lys 

215 220 225 

Pro Gly Val Tyr Thr Lys Val Ser Asp Phe Arg Glu Trp lie Phe 

230 235 240 

Gin Ala lie Lys Thr His Ser Glu Ala Ser Gly Met Val Thr Gin 

245 250 255 

Leu 



<210> SEQ ID NO 4 
<:211> LENGTH: 225 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<2 20> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of Scce 
homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 4 

Lys lie lie Asp Gly Ala Pro Cys Ala Arg Gly Ser His Pro Trp 

5 10 15 

Gin Val Ala Leu Leu Ser Gly Asn Gin Leu His Cys Gly Gly Val 

20 25 30 

Leu Val Asn Glu Arg Trp Val Leu Thr Ala Ala His Cys Lys Met 

35 40 45 

Asn Glu Tyr Thr Val His Leu Gly Ser Asp Thr Leu Gly Asp Arg 

50 55 60 

Arg Ala Gin Arg lie Lys Ala Ser Lys Ser Phe Arg His Pro Gly 

65 70 75 

Tyr Ser Thr Gin Thr His Val Asn Asp Leu Met Leu Val Lys Leu 

80 85 90 

Asn Ser Gin Ala Arg Leu Ser Ser Met Val Lys Lys Val Arg Leu 

95 100 105 

Pro Ser Arg Cys Glu Pro Pro Gly Thr Thr Cys Thr Val Ser Gly 

110 115 120 

Trp Gly Thr Thr Thr Ser Pro Asp Val Thr Phe Pro Ser Asp Leu 

125 130 135 

Met Cys Val Asp Val Lys Leu lie Ser Pro Gin Asp Cys Thr Lys 

140 145 150 

Val Tyr Lys Asp Leu Leu Glu Asn Ser Met Leu Cys Ala Gly lie 

155 160 165 

Pro Asp Ser Lys Lye Asn Ala Cys Asn Gly Asp Ser Gly Gly Pro 

170 175 180 

Leu Val Cys Arg Gly Thr Leu Gin Gly Leu Val Ser Trp Gly Thr 

185 190 195 



Phe Pro Cys Gly Gin Pro Asn Asp Pro Gly Val Tyr Thr Gin Val 

200 205 210 



5,972,616 

27 28 

-continued 



Cys Lys Phe Thr Lys Trp lie Asn Asp Thr Met Lye Lys His Arg 

215 220 225 



<210> SEQ ID NO 5 
<21l> LENGTH: 225 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of trypsin 
(Try> homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 5 

Lys lie Val Gly Gly Tyr Asn Cys Glu Glu Asn Ser Val Pro Tyr 

5 10 15 

Gin Val Ser Leu Asn Ser Gly Tyr His Phe Cys Gly Gly Ser Leu 

20 25 30 

lie Asn Glu Gin Trp Val Val Ser Ala Gly His Cys Tyr Lys Ser 

35 40 45 

Arg lie Gin Val Arg Leu Gly Glu His Asn He Glu Val Leu Glu 

50 55 60 

Gly Asn Glu Gin Phe He Asn Ala Ala Lys He He Arg His Pro 

65 70 75 

Gin Tyr Asp Arg Lys Thr Leu Asn Asn Asp He Met Leu He Lys 

80 85 90 

Leu Ser Ser Arg Ala Val He Asn Ala Arg Val Ser Thr He Ser 

95 XOO 105 

Leu Pro Thr Ala Pro Pro Ala Thr Gly Thr Lys Cys Leu He Ser 

110 115 120 

Gly Trp Gly Asn Thr Ala Ser Ser Gly Ala Asp Tyr Pro Asp Glu 

125 130 135 

Leu Gin Cys Leu Asp Ala Pro Val Leu Ser Gin Ala Lys Cys Glu 

140 145 150 

Ala Ser Tyr Pro Gly Lys He Thr Ser Asn Met Phe Cys Vol Gly 

155 160 165 

Phe Leu Glu Gly Gly Lys Asp Sor Cys Gin Gly Asp Ser Gly Gly 

170 175 180 

Pro Val Val Cys Asn Gly Gin Leu Gin Gly Val Val Ser Trp Gly 

185 190 195 

Asp Gly Cys Ala Gin Lys Asn Lys Pro Gly Val Tyr Thr Lys Val 

200 205 210 

Tyr Asn Tyr Val Lys Trp He Lys Asn Thr He Ala Ala Asn Ser 

215 220 225 



<210> SEQ ID NO 6 
<211> LENGTH: 231 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of 

chymotrypein (Chymb) homologous to similar domain in TADG-15 

<4 00> SEQUENCE: 6 

Arg He Val Asn Gly Glu Asp Ala Val Pro Gly Ser Trp Pro Trp 

5 10 15 

Gin Val Ser Leu Gin Asp Lys Thr Gly Phe His Phe Cys Gly Gly 

20 25 30 



Ser Leu He Ser Glu Asp Trp Val Val Thr Ala Ala His Cys Gly 

35 40 45 



5,972,616 

29 30 

-continued 



Val Arg Thr Ser Asp Val Vol Val Ala Gly Glu Phe Asp Gin Gly 

50 55 60 

Ser Asp Glu Glu Asn lie Gin Val Leu Lys lie Ala Lys Val Phe 

65 70 75 

Lys Asn Pro Lys Phe Ser lie Leu Thr Val Asn Asn Asp lie Thr 

80 85 90 

Leu Leu Lys Leu Ala Thr Pro Ala Arg Phe Ser Gin Thr Val Ser 

95 100 105 

Ala Val Cys Leu Pro Ser Ala Asp Asp Asp Phe Pro Ala Gly Thr 

110 115 120 

Leu Cys Ala Thr Thr Gly Trp Gly Lys Thr Lys Tyr Asn Ala Asn 

125 130 135 

Lys Thr Pro Asp Lys Leu Gin Gin Ala Ala Leu Pro Leu Leu Ser 

140 145 150 

Asn Ala Glu Cys Lys Lys Ser Trp Gly Arg Arg lie Thr Asp Val 

155 160 165 

Met lie Cys Ala Gly Ala Ser Gly Val Ser Ser Cys Met Gly Asp 

170 175 180 

Ser Gly Gly Pro Leu Val Cys Gin Lys Asp Gly Ala Trp Thr Leu 

185 190 195 

Val Gly lie Val Ser Trp Gly Ser Asp Thr Cys Ser Thr Ser Ser 

200 205 210 

Pro Gly Val Tyr Ala Arg Val Thr Lys Leu lie Pro Trp Val Gin 

215 220 225 

Lys lie Leu Ala Ala Asn 

230 



<210> SEQ ID NO 7 
<211> LENGTH: 255 
<212> TYPE: PRT 
<213> ORGANISM: Unknovm 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of factor 7 
(Fac7) homologous to similar domain in TADG-15 

<4 00> SEQUENCE; 7 

Arg lie Val Gly Gly Lys Val Cys Pro Lys Gly Glu Cys Pro Trp 

5 10 15 

Gin Val Leu Leu Leu Val Asn Gly Ala Gin Leu Cys Gly Gly Thr 

20 25 30 

Leu lie Asn Thr lie Trp Val Val Ser Ala Ala His Cys Phe Asp 

35 40 45 

Lys lie Lys Asn Trp Arg Asn Leu lie Ala Val Leu Gly Glu His 

50 55 60 

Asp Leu Ser Glu His Asp Gly Asp Glu Gin Ser Arg Arg Val Ala 

65 70 75 

Gin Val He He Pro Ser Thr Tyr Val Pro Gly Thr Thr Asn His 

80 85 90 

Asp He Ala Leu Leu Arg Leu His Gin Pro Val Val Leu Thr Asp 

95 100 105 

His Val Val Pro Leu Cys Leu Pro Glu Arg Thr Phe Ser Glu Arg 

110 115 120 

Thr Leu Ala Phe Val Arg Phe Ser Leu Val Ser Gly Trp Gly Gin 

125 130 135 



Leu Leu Asp Arg Gly Ala Thr Ala Leu Glu Leu Met Val Leu Asn 

140 145 150 



5,972,616 

31 32 

-continued 



Val Pro Arg Leu Met Thr Gin Asp Cys Leu Gin Gin Ser Arg Lys 

155 160 165 

Val Gly Asp Ser Pro Asn lie Thr Glu Tyr Met Phe Cys Ala Gly 

170 175 180 

Tyr Ser Asp Gly Ser Lys Asp Ser Cys Lys Gly Asp Ser Gly Gly 

185 190 195 

Pro His Ala Thr His Tyr Arg Gly Thr Trp Tyr Leu Thr Gly lie 

200 205 210 

Val Ser Trp Gly Gin Gly Cys Ala Thr Val Gly His Phe Gly Val 

215 220 225 

Tyr Thr Arg Val Ser Gin Tyr lie Glu Trp Leu Gin Lys Leu Met 

230 235 240 

Arg Ser Glu Pro Arg Pro Gly Val Leu Leu Arg Ala Pro Phe Pro 

245 250 255 



<210> SEQ ID NO 8 
<211> LENGTH: 253 
<212> TYPE: PRT 
<213> ORGANISM: Unknown 
<220> FEATURE: 

<223> OTHER INFORMATION: Serine protease catalytic domain of tissue 

plasminogen activator {Tpa> homologous to similar domain in 
TADG-15 

<4 00> SEQUENCE: 8 

Arg lie Lys Gly Gly Leu Phe Ala Asp lie Ala Ser His Pro Trp 

5 10 15 

Gin Ala Ala lie Phe Ala Lys His Arg Arg Ser Pro Gly Glu Arg 

20 25 30 

Phe Leu Cys Gly Gly lie Leu lie Ser Ser Cys Trp lie Leu Ser 

35 ' 40 45 

Ala Ala His Cys Phe Gin Glu Arg Phe Pro Pro His His Leu Thr 

50 55 60 

Val lie Leu Gly Arg Thr Tyr Arg Val Val Pro Gly Glu Glu Glu 

65 70 75 

Gin Lys Phe Glu Val Glu Lys Tyr lie Val His Lys Glu Phe Asp 

80 85 90 

Asp Asp Thr Tyr Asp Asn Asp lie Ala Leu Leu Gin Leu Lys Ser 

95 100 105 

Asp Ser Ser Arg Cys Ala Gin Glu Ser Ser Val Val Arg Thr Val 

110 115 120 

Cys Leu Pro Pro Ala Asp Leu Gin Leu Pro Asp Trp Thr Glu Cys 

125 130 135 

Glu Leu Ser Gly Tyr Gly Lys His Glu Ala Leu Ser Pro Phe Tyr 

140 145 150 

Ser Glu Arg Leu Lys Glu Ala His Val Arg Leu Tyr Pro Ser Ser 

155 160 165 

Arg Cys Thr Ser Gin His Leu Leu Asn Arg Thr Val Thr Asp Asn 

170 175 180 

Met Leu Cys Ala Gly Asp Thr Arg Ser Gly Gly Pro Gin Ala Asn 

185 190 195 

Leu His Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro Leu Val Cys 

200 205 210 

Leu Asn Asp Gly Arg Met Thr Leu Val Gly lie lie Ser Trp Gly 

215 220 225 



Leu Gly Cys Gly Gin Lys Asp Val Pro Gly Val Tyr Thr Lys Val 

230 235 240 



5,972,616 

33 34 

-continued 



Thr Asn Tyr Leu Asp Trp lie Arg Asp Asn Met Arg Pro 

245 250 

<210> SEQ ID NO 9 

<21l> LENGTH: 2900 

<212> TYPE: DNA 

<2 1 3> ORGANISM : Homo sapiens 

<220> FEATURE; 

<223> OTHER INFORMATION: SNC19 mRNA sequence (U20 428) 
<4 00> SEQUENCE: 9 



cgctgggtgg 


tgctggcagc 


cgtgctgatc 


ggcctcctct 


tggtcttgct 


ggggatcggc 


60 


ttcctggtgt 


ggcatt-tgca 


gtaccgggac 


gtgcgtgtcc 


agaaggtctt 


caatggctac 


120 


atgaggatca 


csaatgagaa 


ttttgtggat 


gcctacgaga 


actccaact-c 


cactgagttt 


180 


gtaagcctgg 


ccagcaaggt 


gaaggacgcg 


ctgaagctgc 


tgtacagcgg 


agtcccattc 


240 


ctgggcccct 


accacaagga 


gtcggctgtg 


acggccttca 


gcgagggcag 


cgtcatcgcc 


300 


tactactggt 


ctgagttcag 


catcccgcag 


cacctggttg 


aggaggccga 


gcgcgtcatg 


360 


gccaggagcg 


cgtagt-catg 


ctgcccccgc 


gggcgcgctc 


cctgaagtcc 


tttgtggtca 


420 


cctcagtggt 


ggctttcccc 


acggactcca 


aaacag-taca 


gaggacccag 


gacaacagct 


480 


gcagctttgg 


cctgcacgcc 


gcgg-tgtgga 


gctgatgcgc 


ttcaccacgc 


cggcttccct 


540 


gacagcccct 


accccgctca 


tgcccgctgc 


cagtgggctg 


cggggacgcg 


acgcagtgct 


600 


gagctactcg 


agcbgactcg 


cagcttgact 


gcgcctcgac 


gagcgcggca 


gcgacctggt 


660 


gacgtgtaca 


acaccctgag 


ccccatggag 


ccccacgcct 


ggtgagtgtg 


tggcacctac 


720 


cctccctcct 


acaacctgac 


cttccactcc 


ctcccacgaa 


cgtcctgctc 


atcacactga 


780 


taaccaacac 


-tgacgcggca 


tcccggcttt 


gaggccacct 


tcttccagct 


gcctaggatg 


840 


agcagctg'tg 


gaggccgctt 


acgtaaagcc 


caggggacat 


tcaacagccc 


ctactaccca 


900 


ggccactacc 


cacccaacat 


tgactgcaca 


tggaaaattg 


agg-tgcccaa 


caaccagcat 


960 


gtgaaggtgc 


gcttcaaatt 


cttctacctg 


ctggagcccg 


gcgtgcctgc 


gggcacctgc 


1020 


cccaaggact 


acg'bggaga-b 


caa-tggggag 


aaatac-tgcg 


gagagaggtc 


ccagttcgtc 


1080 


gtcaccagca 


acagcaacaa 


gatcacagtt 


cgcttccact 


cagatcagtc 


ctacaccgac 


1140 


accggcttct 


tagctgaata 


cctctcctac 


gactccagtg 


acccatgccc 


ggggcagttc 


1200 


acgtgccgca 


cggggcggtg 


tatccggaag 


gagctgcgct 


gtgatggctg 


ggcgactgca 


1260 


ccgaccacag 


cgatgagctc 


aactgcagtt 


gcgacgccgg 


ccaccagttc 


acgtgcaaga 


1320 


gcaagttctg 


caagc-tcttc 


tgggtctgcg 


acagtgtgaa 


cgagtgcgga 


gacaacagcg 


1380 


acgagcaggg 


ttgcatttgt 


ccggacccag 


accttcaggt 


gttccaat-gg 


gaagtgcctc 


1440 


-tcgaaaagcc 


agcagtgcaa 


■tgggaaggac 


gactgtgggg 


acgggtccga 


cgaggcctcc 


1500 


tgccccaagg 


tgaacgtcgt 


cacttgtacc 


aaacacacct 


accgctgcct 


caatgggctc 


1560 


tgcttgagca 


agggcaaccc 


tgagtgtgac 


gggaaggagg 


actgtagcga 


cggctcagat 


1620 


gagaaggact 


gcgactg-tgg 


gctgcggtca 


ttcacgagac 


aggctcgtgt 


tgttgggggc 


1680 


acggatgcgg 


atgagggcga 


gtggccctgg 


caggtaagcc 


tgcatgctct 


gggccagggc 


1740 


cacatctgcg 


gtgcttccct 


catctctccc 


aactggctgg 


tctctgccgc 


acactgctac 


1800 


atcgatgaca 


gaggattcag 


gtactcagac 


cccacgcagg 


acggccttcc 


tgggcttgca 


1860 


cgaccagagc 


cagcgcaggc 


cctggggtgc 


aggagcgcag 


gctcaagcgc 


atcatctccc 


1920 


accccttctt 


caatgacttc 


accttcgact 


atgacatcgc 


gctgctggag 


c-tggagaaac 


1980 



5,972,616 

35 36 

-continued 



cggcagag-ta 


cagctccatg 


g-tgcggccca 


tctgcctgcc 


ggacgcctgc 


catgtcttcc 


2040 


ctgccggcaa 


ggccatctgg 


g^cacgggc-t 


ggggacacac 


ccagtatgga 


ggcactggcg 


2100 


cgctgatcct 


gcaaaagggt 


gagatccgcg 


tcat:caacca 


gaccacc-tgc 


gagaacctcc 


2160 


tgccgcagca 


gatcacgccg 


cgcatgatgt 


gcgtgggctt 


cctcagcggc 


ggcgtggact 


2220 


cctgccaggg 


-tgattccggg 


ggacccctgt 


ccagcgiigga 


ggcggatggg 


cggatcttcc 


2280 


aggccggtgt 


ggtgagctgg 


ggagacgctg 


cgc-tcagagg 


aacaagccag 


gcgtgtacac 


2340 


oaggctccct 


ctgtttcggg 


aatggatcaa 


agagaacact: 


ggggtiatiagg 


ggccggggcc 


2400 


acccaaa'tg-t 


g'tacacc'bgc 


ggggccaccc 


at-cgtccacc 


ccagtgtgca 


cgcctgcagg 


2460 


ctggagactc 


gcgcaccgtg 


acctgcacca 


gcgccccaga 


acatacactg 


tgaactcatc 


2520 


tccaggctca 


aatctgctag 


aaaacct:ct.c 


gcttcctcag 


cctccaaagt 


ggagctggga 


2580 


gggtagaagg 


ggaggaacac 


tggtggttct 


actgacccaa 


ctggggcaag 


gtttgaagca 


2640 


cagcticcggc 


agcccaag-tg 


ggcgaggacg 


cgtttgtgca 


tactgccctg 


c^ctat^acac 


2700 


ggaagacctg 


gatctctagt 


gagt:gtgact 


gccggatctg 


gctgtggtcc 


ttggccacgc 


2760 


tticttgagga 


agcccaggct 


cggaggaccc 


tggaaaacag 


acgggtctga 


gactgaaaat 


2820 


gg-tttaccag 


ctcccagg-tg 


acttcagtgt 


gtgtattgtg 


taaatgagta 


aaaca'ttt'ta 


2880 


tttcttttta 


aaaaaaaaaa 










2900 



<210> SEQ ID NO 10 
<211> LENGTH: 20 
<212> TYPE; DNA 

<213> ORGANISM: Artificial Sequence 

<2 20> FEATURE; 

<221> NAME /KEY: prime r_bind 

<2 22> LOCATION: 1-2 0 

<223> OTHER INFORMATION: Forward primer for analysis of overexpression 
of TADG-15 mRNA by quantitative PGR. 

<4 00> SEQUENCE: 10 

atgacagagg attcaggtac 20 

<210> SEQ ID NO 11 

<211> LENGTH; 20 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<221> NAME/KEY: primer_bind 

<222> LOCATION: 1-20 

<2 23> OTHER INFORMATION: Reverse primer for analysis of overexpression 
of TADG-15 mRNA by quantitative PGR. 

<4 00> SEQUENCE; 11 

gaaggtgaag tcattgaaga 20 

<210> SEQ ID NO 12 

<211> LENGTH: 17 

<212> TYPE; DNA 

<213> ORGANISM: Artificial Sequence 

<2 20> FEATURE: 

<2 21> NAME /KEY: prime r_bind 

<2 22> LOCATION: 1-17 

<2 23> OTHER INFORMATION: Forward primer for analysis of B- tubulin mRNA 
expression by quantitative PCR . 

<4 00> SEQUENCE: 12 

tgcattgaca acgaggc 17 



<210> SEQ ID NO 13 
<211> LENGTH; 17 



37 



5,972,616 

-continued 



38 



<212> TYPE: DNA 

■<:213> ORGANISM: AirtOJ^cial Sequence 
<220> FEATURE: 
<221> NAME/KEY; primer_bind 
<:222> LOCATION: 1-17 

<223> OTHER INFORMATION: Forward primer for analysis of B- tubulin mRNA 
expression by quantitative PGR. 

<4 00> SEQUENCE: 13 

ctgtcttgac attgttg 17 



What is claimed is: 

1. DNA encoding a Tumor Antigen Derived Gene- 15 15 
(TADG-15) protein selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-15 protein; 

(b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-15 protein; and 

20 

(c) isolated DNA differing from the isolated DNAs of (a) 
and (b) above in codon sequence due to the degeneracy 
of the genetic code, and which encodes a TADG-15 
protein. 

2. The DNA of claim 1, wherein said DNA has the ^5 
sequence shown in SEQ ID No:l. 

3. The DNA of claim 1, wherein said TADG-15 protein 
has the amino acid sequence shown in SEQ ID No: 2. 

4. A vector comprising the DNA of claim 1 and regulatory 
elements necessary for expression of the DNA in a cell. 

5. The vector of claim 4, wherein said DNA encodes a 
TADG-15 protein having the amino acid sequence shown in 
SEQ ID No:2. 

6. A host cell transfected with the vector of claim 4, said 
vector expressing a TADG-15 protein. 

7. The host cell of claim 6, wherein said cell is selected 
from group consisting of bacterial cells, mammalian cells, 
plant cells and insect cells. 



8. The host cell of claim 7, wherein said bacterial cell is 
E. coli. 

9. Isolated and purified TADG-15 protein coded for by 
DNA selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-15 protein; 

(b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-15 protein; and 

(c) isolated DNA differing from the isolated DNAs of (a) 
and (b) above in codon sequence due to the degeneracy 
of the genetic code, and which encodes a TADG-15 
protein. 

10. The isolated and purified TADG-15 protein of claim 
9 having the amino acid sequence shown in SEQ ID No:2. 

11. A method of detecting expression of the protein of 
claim 9, comprising the steps of: 

(a) contacting mRNA obtained from a cell with a labeled 
hybridization probe; and 

(b) detecting hybridization of the probe with the mRNA. 





Exhibit 5 



Pnc. Nad Acad. Sci. USA 

Vol. 78, No. 12, pp. 7323-7326, December 1981 

Biochemistry 



Catalytic mechanism of serine proteases: Reexamination of the pH 
dependence of the histidyl Vi3C2-h coupling constant in the 
catalytic triad of a-l5rtic protease^ 

NMR/enzyme mechamsms/biosynthetic isotoptc enriduDent/histidiiie auxotropb/charge-re!ay system) 

William W. BACHOVCHiNt, Robert Kaiser*, John H. Richards*, and John D. Roberts* 

'''Depajtment of Biochemistry and Pharmacotogy, Tufts University School of Medicine, Boston. Massachusetts Q2111; and Hlie Gates and CreUin LAboratories of 
Chemistry. Caliibniia Institute of Technology, Pasade n a. California 91125 

Contributed by John D. Roberts, August 10, 1981 



ABSTRACT L-Histidine, 90% enriched at the C2 posi- 
tion, was incorporated into the catalytic triad of a^lytic pro- 
tease <EC 3.4.21.12) with the aid of a histidine-requiring mutant 
i^Lysobacter enxtftnogenes (ATC 29487), and the pH dependence 
of the coupling constant between this carbon atom and its directly 
bonded proton was reinvestigated. The high degree of specific '"a^ 
isotopic enrichment attainable with the auxotroph permits direct 
observation and measurement of this coupling constant in proton- 
coupled NMR spectra at 67.89 MHz and at 15. 1 MHz. In con- 
trast to the earlier study, the present results indicate that this cou- 
pling constant does respond to a microscopic ionization with pK, 
near 7.0; moreover, the magnitude of the values of Vc-ii observed 
are in accord with those expected for titration of the histidyl res- 
idue. We conclude that the original measurement must be in error 
and that this coupling constant now also supports a histidyl residue 
. that titrates more or less normally as a component of the catalytic 
triad of serine proteases. 

A "catalytic triad" comprised of the side-chain functional groups 
of aspartic acid, histidine, and serine has thus far proved to be 
an invariant feature of the active sites of serine proteinases as 
demonstrated by x-ray diffraction, studies (1-6). The ubiquity 
and diversity of individual enzymes belonging to this class sug- 
gests that this array of Asp-His-Ser residues possesses special 
catalytic properties! The precise mode of operation of this triad 
in serine protease-catalyzed hydrolysis of amides and esters is. 
therefore, of considerable interest. 

A prerequisite to the understanding of the effectiveness of 
this triad is a knowledge of the ionization behavior of its com- 
ponent functional groups, and this has been a controversial is- 
sue. A histidyl residue is essential for activity (7-10), and be- 
cause the activities of serine proteinases increase with pH in a 
manner indicative of the titration of a single group having a pK, 
«7.0 (11), this ionization was originally assumed to represent 
that of the particular histidyl residue. However, Hunkapiller 
et al (12) proposed that this pK, of 7.0 should instead be as- 
signed to the aspartic acid residue and that the histidyl residue 
should be assigned a pK, of less than 4.0. The experimental basis 
for this proposal was a determination that the coupling constant 
between C2 of the histidyl residue in the catalytic triad of a-lytic 
protease and its directly bonded proton was independent of pH 
over the range 4.0-8.0 and indicative of a neutral imidazole 
ring. The result of this effective reversal of normal pK^ assign- 
ments is. to make the aspartic acid carboxylate the ultimate 
charge donor in the operation of the so-called "charge-relay" 
mechanism (1, 12) of attack on the peptide bond. 

The hypothesis that histidyl residues in the catalytic triads 
of serine proteases are abnormally weak bases, whereas the cor- 
responding aspartic acid residues are abnormally weak acids, 
has received considerable support, both experimental (13-18) 
and theoretical (19-23). There are^ however, other experimen- 



tal results (24-28) that indicate more normal ionization behav- 
ior; at one time, substantial controversy on this point existed. 
Recent (29) and ^H NMR (30-32) studies strongly indicate 
that histidyl residues at the catalytic site titrate more or less 
normally. Nevertheless, the experimental data originally sup- 
porting the pK,-reversal hypodiesis remain to be reconciled 
with these studies. Especially troublesome are the measure- 
ments of the histidyl V'^cz-h coupling constant for a-lytic pro- 
tease (12) because this result is difficult to attribute to anything 
but a histidyl residue with an abnormally low pK,. 

Hie reported measurements of V»c:2-h 
culties. A major problem is that the difference in magnitude of 
this coupling constant between the protonated (^2X8 Hz) and 
neutral («»208 Hz) forms of the imidazole ring is small, and its 
measurement in a-lytic protease was hampered, by large line- 
widths and by background natural-abundance resonances that 
obscured one hne of the doublet. Therefore, determination of 
the coupling required measurement of 1/2 / or the taking of 
difference spectra. Indeed, whether this measurement could 
be made with sufficient precision under these circumstances 
has been questioned (26, 33). 

Improved NMR instrumentation operating at higher mag- 
netic field offers the possibility of enhancing the accuracy of the 
measurements because, at higher fields, interference from 
background natural-abundance signals should be substantially 
reduced. Also, a histidine-requiring mutant of Lysobacter en- 
zymogenes is now available which allows one to achieve a higher 
specific enrichment and, thus, to obtain improved signal 
detection and resolution. In view of these improved prospects 
for measuring this coupling constant and the difficulties asso- 
ciated with the earlier study, we report here a reexamination 
of its pH dependence in a-lytic protease. 

MATERIALS AND METHODS 

L-Histidine, selectively enriched with "C at C2 was obtained 
fewm Isotope Labelling (Whipp, NJ). or KOR Isotopes, (Cam- 
bridge, MA), and vras synthesized from L-2,5-diamino-4-keto- 
valertic acid and KS^^CN as described by Ashley and Harring- 
ton (34) and Heath et aL (35). Each preparation was judged to 
be roughly equivalent in regard to purity and specific * C en- 
richment («=92%) by NMR spectroscopy. Ac-L-Ala-L-Pro- 
L-Ala-p-nitroanilide was synthesized as described by Hunka- 
piller et aL (36) and used to assay the activity of the enzyme. 

The *^C-labeled histidyl-a-lytio-protease was prepared and 
purified by culturing a histidine-requiring mutant of L. enzym- 
ogenes using the previously described procedures {12, 29), The 



* Presented in part at the Ninth Intemctional Conference on Magnetic 
Resonance in Biological Systems , Bender, France, September 1-6, 
1980. 



7323 



7324 Biochemistry: Bachovchin et oL 



Proa Natl Acad. Set. USA 78 (1981 ) 





1 



200 



180 



160 140 
ppm from TM8 



120 



100 



FiO. 1. Proton-Klecoupled 67.89-MHz NMR spectra of a-lytic protease. (A) [2-'^:iHistidyl-enriched a-lytic protease (-"3 mM at pH 4.7; 6400 
scans with a recycle time of 0.84 sec). (B) Natural-abundance o-lytic protease (*«8 mM at pH 6.0; 46,000 with a reticle time of 2 sec). 



peptidase activity of a-lytic protease was assayed against Ac-L- 
Ala-L-Pro-L-Ala-p-nitroanilide (4 x lO'^M in 0.05 M Tris buf- 
fer, pH 8.75, at 25*^0), Based on A^^s = 8.9. purified prepa- 
rations of a-lytic protease used in these NMR studies exhibited 
K^t/K^ values of 2.0 x 10^ s'* as compared to a value of 



1.5 X itf* M"*s"^ reported previously (36). 

^^C NMR spectra were recorded at 67.89 MHz on a Bruker 
HX-270 spectrometer and at 15.08 MHz on a Bruker WP-60 
spectrometer; 10-mm probes were used with both instruments. 
The NMR samples were 1-5 mM in a-Iytic protease and were 



B 




I. 



I 



200 



180 



100 140 

ppm- from TM8 



120 



100 



Fio. 2. Proton-coupled 67.89-MH3E ^^C NMR spectra of [2-^'^lhistidyl-enriched a-lytic protease. (A) Enzyme (1.6 mM) at pH 5.54 (25,300 scans 
with a recycle time of 0.84 sec). (B) Enzyme (1.3 mM) at pH 8.24 (38,500 scans with a recycle time of 0.84 sec). 



Biochemistry: Bachovchin et at 



Proa NatL Acad. Sci. USA 78 (1981 ) 7325 




FiO. 3. Comparison of representative hi^ and low pH doublets from 67.89-MHz proton-coupled spectra of [2-^%]histidyl-«nriched a-lytic 
protease, , Enzyme (1.34 mM) at pH 8.24 (38,650 scans); — , 1.5 mM enzyme at pH 5.25 (51,960 scans). 



prepared by dissolving lyophilized powders of enzyme in 0.1 
M KCl. About 15% of ^H20 was added to provide an internal 
Beld frequency lock signal. The relatively sharp signal in ^^C 
NMR specta of a-lytic protease arising firom the guanidinium 
carbons of the 12 arginine residues (and previously assigned a 
chemical shift of 157.25 ppm relative to tetramethylsilane) was 
used as an internal reference after its position relative to internal 
dioxane was verified to be the same at high and low pH. Chem- 
ical shifts are reported in ppm from tetramethylsilane. 

In general, 67. 89- MHz C spectra were acquired by using 
a 90° radiofrequency pulse (26 a spectral width of 16,000 
Hz, and 8000 data points. The ^^C spectra at 15.08 MHz were 
acquired with a 90" pulse (21 /is), a spectral width of 4000 Hz, 
and 2000 data points. 

The pH of the solution and the specific activity of the enzyme 
were checked both before and after recording each spectrum; 
only for those samples which exhibited no discernible change 
in these parameters are spectra reported here. The pH of the 
sample was varied by the addition of 0.25-0.5 M NaOH or HCl. 



RESULTS AND DISCUSSION 

Representative proton-decoupled 67.89-MHz ^^C NMR spectra 
of unlabeled a-lytic protease and of [2-*^]histidyl-Iabeled a- 
lytic protease are compared in Fig. 1. The large single resonance 
at 135 ppm present only in the spectrum of the isotopically en- 
riched enzyme is clearly that of the ^^C-labeled carbon of the 
histidyl residue. Hie pH dependence of the chemical shift of 
this resonance is the same as reported earlier (12). Represent- 
ative proton-coupled '^^C NMR spectra at hi^ and low pH are 
shown iii Fig. 2; now both, lines of the doublet are olearly re- 
solved at high and low pH, so that ^Jc-h can be measured di- 
rectly from the peak 'separation. Six independent determina- 
tions of 7c-H were made at pH values of 4.66, 5:25, 5.35, 5.47, 
5.54, and 6.02, which gave-values for '/c-h of 219, 217. 219, 217, 
217. and 216 Hz, respectively. Two determinations of ' 7c-h 
pH 8.24 and 8.44 gave values of 208 and 204. respectively. 
Either Lorentzian or parabolic interpolation of the peak posi- 
tions yielded the same value for Vc-h- curves in Fig. 3 for 
representative high and low pH doublets demonstrate that 
Vc-H does change with pH. 

In addiUon to the high-field ^^C NMR measurements at 67.89 
MHz, the coupling constant was also determined by ^^C NMR 
spectroscopy at 15.1 MHz, and even at this lower magnetic 
field, both lines of the doublet were sufficiently resolved to 



allow direct measurement of the coupUng. Two independent 
determinations of the coupling constant in both the high and 
low pH ranges gave effectively the same results as the mea- 
surements at 67.89 MHz. 

Hie present results indicate that this coupling constant does 
respond to an ionization of the histidyl residue with a pK. near 
7.0, and the original measurements (12) must be in error. Hie 
source of this error is, at present, not dear, but possibly derives 
from the presence of multiple forms of the enzyme (31) at acidic 
pH, These forms can be resolved at 125 MHz where they are 
in slow exchange (R. J. Kaiser , and T. C. Perkins, personal 
communication). 

Consequendy, the NMR data (^'N, "C. and ^H) now support 
a histidyl residue which titrates more or less normally as a com- 
ponent of the active-site catalytic triads of serine proteases — at 
least for the free enzyme in solution. Other experimental or 
theoretical studies that support, as well as mechanistic schemes 
based upon, the pK,-reversal hypothesis need reappraisal. 

Hiis work was supported by grants from the National Institutes of 
Health (CM-27927 and CM 164221) and from Research Corporation. 
The high-field NMR experiments were performed at the NMR Facility 
for Biomolecular Research located at the F. Bitter National Magnet 
Laboratory (Massachusetts Institute of Technology). The NMR Facility 
is supported by Grant RR00995 from the Division of Research Re- 
sources of the National Institutes of Health and by National Science 
Foundation Contract C-670. 



1. Blow. D. M., Birktoft, J. J..& Hartley. B. S. (1969) Nature (Lon- 
don) 221, 337-340. 

2. Stroud. R. M., Kay. L. M. & Dickerson, R. E. (1974)/. hfoL BioL 
83, 185-208. 

3. Sawyer. L., Shotton, D. M.. Campbell. J. W.. Wendell, P. L., 
Muirhead, H., Watson, H. C, Diamond, R. ftXadner, R. C. 
(1978) /, MoL BioL 118, 137-208. 

4. Matthews, D. A...A}den. R. A.. Birktoft. J. J.. Freer, S. T. & 
Kraut. J. (1977) /. BioL Chem. 252, 8875-^8883. 

5. Codding, P. W., Delbaere. L. T. J., Hayalcawa. K.. Hutcheon, 
W. L. B.. James. M. N. C. & Jur^£ec, L. (1974) Con./. Biochem. 
52. 208-220. 

6. James. M. N. C, Delbaere. L. T. J. & Brayer, G. D. (1978) Can, 
/. Biochem. 56, 396^402. 

7. Ong, E. B.. Shaw. E. & Schoellman. C. (1964)/. Am. Chem. Sac 
. 86, 1271-1272. 

8. Schoellman. C. & Shaw, E. (1962) Biochem. Biophys. Res. Com- 
man. 7. 36-40. 

9. Ray. W. J. , Jr. & Koshland, D. E. , Jr. (1960) Brookhhaven Symp. 
BioL 13, 135-150. 



7326 Biochemistry: Bachovchin et aL ' 



Proc Sail Acad.. Set, USA 78 (1981) 



10. Weil. L., James, S. & Buchert. A. R. (1953) Arch. Bkfchem. Bio- 
phys. 46, 266-278. 

11. Hess, G. P. (1971) Enxyme3 3, 213-248. 

12. Hunkapiller, M. W.. Smallcoinbe, S. H.. Whitaker, D. R. & 
Ridiards, J. H. {l^) Biochemistry 12, 4732-4743. . 

13. Koeppe, R. E., 11 & Stroud. R. M. (1976) Biochemistry .15, 
3450—3458. 

14. Markley. J. L. (1975) Aca Chem. Res. 8, 70-«0. 

15. Markley, J. L. & Porubcan, M. A. (1976) /. MoL BioL 102, 
487-509. 

16. Faiaggi, M., Klapper. M. H. & Dorfinan. L. M. (1978) /. Phys. 
C/»em. 82, 508-512. 

17. Komtyama. M., Bender. M. L., Utaka, M. & Takeda, A. (1977) 
Proc. NatL Acad. Sci. USA 74. 2634-2638. 

18. Komiyma. M., Rosel, T. R. a Bender. M. L. (1977) Proc NatL 
Acad. Sci. USA 74, 23-25. 

19. Scheiner. S.. Kleier. D. A. & Upscomb, W. N. (1975) Proc NatL 
Acad. Sci. USA 72, 2606-2610. 

20. Scheiner. S. & Lipscomb. W. N. (1976) Proc NatL Acad. Sci. USA 
73, 432-436. 

21. Beppeu, Y. & Yomosa, S, (1977) J. Phys. Soc Jpn. 42, 1694-1700. 

22. Araidon. G. L. (1974) /. Theor. BioL 46^ 101-109. 



23. Kitayama. H. P. & Fukulome, H. (1976) Theor. BioL 66, 1-18, 

24. RobiJlard. G. & Shulman, R. G. (1972)/. MoL BioL 71, 50T-511. 

25. Robillard. G. & Shuhnan, R; G. (1974) /. MoL BioL 86, 51^-540. 

26. Robillard, C. & Shulraan, R. G. (1974) /. MoL BioL 86, 541-^. 

27. Bniicc. T. C. (1976) Annu. Rev. Bidchem. 45, 331-373. . 

28. Rogers, G. A. & Bruice, T. C. (1974) /. Am. Chem. Soc 96, 
2473-2481. 

29. Bachovchin. W, W. & Roberts. J. D. (1978) /. Am. Chem. Soc 
100, 8041-^7. 

30. Marfdey, J. L. & Ibanez, I. B. (1978) Biochemistry IT, 4627-4640. . 

31. Westler, W. M. (1980) Dissertation (Purdue Univ., La&yette, 
IN). 

32. Markley. J. L.. Neves, D. E.. Westler. W. M,, Ibanez. I. B., 
Porubcan. M. A. & Baillargeon. M. W. (1980) Oeo. Biochem. 10, 
31-62. 

33. Egan, W., Shindo, H. & Cohen, J. (1977) Anmi. Rev. Biophys. 
Bioeng. 6, 383-417. 

34. Ashley, J. H, & Harrington. R. (1930) /. Chem. Soc, 2586-2590, 

35. Heath, H., Lawson, A. & Rimington, C, (1951) /. Chem. Soc, 
2215-2222. 

36. HunkapiUar, M. W., Forgac. M. D. & Richards. J. H. (1976) Bio- 
chemistry, 15, 5581-5588. 



Exhibit 6 



Perspectives in 
Bioconjugate Chemistry 



Edited by 

Claude F. Meares 

University of California 




American Chemical Society, Washington, DC 1 993 






library or Congress Cataloging-ln-Publication Data 

Perspectives in bioconjugaie chemistry / edited by Claude F. Meares. 
p. cm. 

Contains a collection of articles previously published in the journal: 
Bioconjugate chemistry. 

Includes bibliographical references and Index. 

4 

ISBN 0-8412-2672-5 
1. Bioconjugates. 

I. Meares, Claude F,» 1946- . IL American Chemical Society. 
QP517.B49P47 1993 

574.19'2— dc20 93- 15385 



The paper used In this publication meets the minimum requirements of American 
National Standard for Information Sciences — Permanence of Paper for Printed • 
Library Materials, ANSI Z39.48-1984. ^ • 



Copyright © 1993 
American Chemical Society 

All Rights Reserved. The appearance of the code at the bottom of the first page 
of each chapter in this volume indicates the copyright owner's consent that 
reprographic copies of the chapter may be made for personal or Internal use or 
for the personal or internal use of specific clients. Tt\is consent is given on the 
condition, however, that the copier pay the stated per-copy fee through the 
Copyright Clearance Center, Inc., 27 Congress Street, Salem, MA 01970, for 
copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law, 
This consent does not extend to copying or transmission by any means — graphic 
or electron! c^for any other purpose, such as for general distribution, for 
advertising or promotional purposes, for creating a new collective work, for 
resale, or for information storage and retrieval systems. The copying fee for each 
chapter Is indicated In the code at the tx>ttom of the first page of the chapter. 

The citation of trade names and/or names of manufacturers in this publication is 
not to be construed as an endorsement or as approval by ACS of the commercial 
products or services referenced herein; nor should the mere reference herein to 
any drawing, specification, chemical process, or other data be regarded as a 
license or as a conveyance of any right or permission to the holder, reader,. or any 
other person or corporation, to manufacture, reproduce, use, or sell any patented 
invention or copyrighted work that may in any way be related thereto. Registered 
names, trademarks, etc., used in this publication, even without specific indication 
thereof, are not to l>e considered unprotected by law. 



CIP 




PRINTED IN THE UNITED STATES OF AMERICA 



Chapter 4 



A Brief Survey of Methods for Preparing Protein 
Conjugates witli Dyes, Haptens, and Cross-Linking 
Reagents 

Michael Brinkley 

Molecular Probes, Inc., 4849 Pitchford Avenue, Eugene, OR 97402 

Reprinted from Bioconjugate Chemistry^ Vol. 3, No. 1, January/February, 1992 



I. INTRODUCTION 

ModiHcation of proteins, DNA, and other biopolymers 
by labeling them with reporter molecules has become a 
very powerful research tool in immunology, histochem- 
istry, and cell biology. A number of excellent reviews of 
this subject have been published ( . In addition, there 
are a growing number of commercial applications of these 
modified biomolecules, including clinicEd immunoassays, 
DNA hybridization tests, and gene fusion detection 
systems. In these techniques, a small molecule with special 
properties, such as fluorescence or binding speciHcity, is 
covalently bound to a protein, a DNA strand, or other 
biomolecule; Specific examples include fluorescent- 
labeled antibodies for detection and localization of cell- 
surface antigens, biotin-labeled single-stranded DNA 
probes for detection of DNA hybridization, and hapten- 
labeled proteins that, when introduced into a suitable host 
animal, generate liapten-specific antibodies. 

This review will focus on the experimental design and 
procedures for preparing protein conjugates with dyes, 
biotin, and haptens such as drugs and hormones. Methods 
for covalently linking two unlike biopolymers through the 
judicious choice of cross-linking reagents will also be 
discussed The following specific topics will be addressed: 
(a) reactive groups of proteins that are available for 
modification, including their naturally occurring amino 
acids, and reactive groups introduced by chemical mod- 
ification, (b) reagents that can be us6d to couple molecules 
to these reactive sites, (c) experimental procedures for 
preparing conjugates, (d) purification and isolation of 
conjugates, and (e) techniques for determining the degree 
of labeling. 

II. GENERAL DISCUSSION OF METHODS 

A. Reactive Groups of Proteins. Proteins and pep- 
tides are amino acid polymers containing a number of 
reactive side chains. In addition to, or as an alternative 
to, these intrinsic reactive groups, specific reactive moieties 
can be introduced into the polymer chain by chemical 



modification. These groups, whether or not they are 
naturally a part of the protein or are artificially introduced, 
serve as "handles" for attaching a wide variety of moleciiles, 
including other proteins. The intrinsic reactive groups of 
proteins are described in the following section. 

(1) Amines {Lysines, a- Amino Groups) . One of the most 
common reactive groups of proteins is the aliphatic €-amine 
of the amino acid lysine. Lysines are usually present to 
some extent and are often quite abundant. For example, 
the protein bovine insulin contains only a single lysine 
amine, while avidin, a protein found in egg whites, contains 
36 lysines (7), Lysine amines are reasonably good nu- 
cleophiles above pH 8.0 (pKa = 9.18) (5) and therefore 
react easily and cleanly with a variety of reagents to form 
stable bonds <eq 1). Other reactive amines that are found 

Piot*4rvf4Hy + RX > Ph»t»in-NHR + XM (1) 

in proteins are the a-amino groups of the N-terminal amino 
acids. The a-amino groups are less basic than lysines and 
are reactive at around pH 7.0. Sometimes they can be 
selectively modified in the presence of lysines. There is 
usually at least one a-amino acid in a protein, and in the 
case of proteins that have multiple peptide chains or several 
subunits, there can be more (one for each peptide chain 
or subunit). Bovine insulin has one N-terminal glycine 
residue and one N-terminal phenylalanine (9). There are 
proteins that do not possess free a-amino groups, such as 
cytochrome C and ovalbumin. In these molecules, the 
N-terminal amino group is N-acylated, and therefore, not 
reactive toweird the usual modification reagents. Since 
either N-terminal amines or lysines are almost always 
present in any given protein or peptide, and since they are 
easily reacted, the most commonly used method of protein 
modification is through these aliphatic amine groups. 

(2) Thiols {Cystine, Cysteine, Methionine), Another 
common reactive group in proteins is the thiol residue 
from the sulfur-containing amino acid cystine and its 
reduction product cysteine (or half-cystine), which are 
counted together as one of the 20 amino acids. Cysteine 
contains a free thiol group, which is more nucleophilic 



2672-5/93/0059$06.pO/0 
1992 American Chemical Society 




60 



than amines and is generally the most reactive functional 
group in a protein. It reacts with some of the same 
modification reagents as do the amines discussed in the 
previous section and in addition can react with reagents 
that are not very reactive toward amines. Thiols, unlike 
most amines, are reactive at neutral pH, and therefore 
they can be coupled to other molecules selectively in the 
presence of amines (eq 2). This selectivity makes the thiol 



> NH,-rVot*Jn>SA 4- XH 



(2) 



group the linker of choice for coupling two proteins 
together, since methods which only couple amines (e,g,, 
glutaraldehyde, dimethyl adipimidate coupling) can result 
in formation of homodimers, oligomers, and other un- 
wanted products (10), Since free sulfhydryl groups are 
relatively reactive, proteins with these groups often exist 
in their oxidized form as disulfide-linked oligomers or have 
internally bridged disulfide groups. Immunoglobulin M 
is an example of a distilfide-linked pentamer, while im- 
munoglobulin G is an example of a protein with internal 
disulfide bridges bonding the subunits together. In 
proteins such as this, reduction of the disulfide bonds with 
a reagent such as dithiothreitol (DTT) is required to 
generate the reactive free thiol (ii). In addition to cys- 
tine and cysteine, some proteins also have the amino acid 
methionine, which contains sulfur in a thioether linkage. 
When cysteine is absent, methionine can sometimes react 
with thiol-reactive reagents such as iodoacetamides (12). 
However, selective modification of methionine is difficult 
to achieve and therefore is seldom used as a method of 
attaching small molecules to proteins. 

(5) Phenols {Tyrosine). The phenolic substituent of 
the amino acid tyrosine can react in two ways. The 
phenolic hydroxyl group can form esters and ether bonds, 
and the aromatic ring can undergo nitration or coupling 
reactions with reagents such as diazonium salts at the 
position adjacent to the hydroxyl group. There is con- 
siderable literature describing the reaction of tyrosyl 
residues with diazonium compoimds (J 5). For example, 
ap-aminobenzoyl biocytin derivative has been diazotized 
and reacted with protein tyrosine groups {14). Modifi- 
cation of tyrosines has primarily been used in structural 
studies, rather than as a means for attaching specific labels, 
since acetylation and nitration can give useful information 
concerning the participation of tyrosine in the binding 
properties of proteins. Often, the reactivity of tyrosines 
with amine-selective modification reagents to form un- 
stable carboxylic acid esters or sulfate esters is an unwanted 
side reaction resulting in conjugates that slowly hydrolyze 
during storage. Methods for preventing this problem are 
discussed in a later part of this teaching editorial (section 
V.B,1). 

{4} Carboxylic Acids {Aspartic Acid, Glutamic Acid), 
Proteins contain carboxylic acid groups at the carboxy- 
terminal position and within the side chains of the di- 
carboxylic amino acids aspartic acid and glutamic acid. 
The low reactivity of carboxylic acids in water usually 
makes it difficult to use these groups to selectively modify 
proteins and other biopolymers. In the cases where this 
is done, the carboxylic acid group is usuaUy converted to 
a reactive ester by use of a water-soluble carbodiimide 



o 
II 

Prot*in-COH 



> rVotoin-COX RNHNH, 



O 
II 

PtaUkv CNHNHR (3) 



PERSPECTIVE 



CONJUGATE CHEMISTRY 



and then reacted with a nucleophilic reagent such as an 
amme or a hydrazidd {15, 16). The amine reagent should 
be weakly basic in 6rder to react specifically with the 
activated carboxylic: acid in the presence of the other 
amines on the protein. This is because protein cross- 
linking can occur when the pH is raised to above 8.0, the 
range where the protein amines are partially unproto- 
nated and reactive. ^For this reason, hydrazides, which 
are weakly basic, arejiiseful in coupling reactions with a 
carboxylic acid {17)\ This'-reaction can also be used 
effectively to modify the carboxy terminal group of small 
peptides. \ 

' {5) Other Amino Acid Side Chains {Arginine, Hiati- 
dine^Tryptophan). (Chemical modification of other amino 
acid side chains in ^proteins has not been extensive, 
compared to the groups disoissed above. The high pK^ 
of the guanidine functional group of arglnine (pKa = 12- 
13) necessitates more drastic reaction conditions than most 
proteins can survive* Arginine modification has been 
accomplished primarUy with glyoxals and a-diketone 
reagents {18), Trypti)phan modification requires harsh 
conditions and is seldom carried out except as a method 
of analysis in structural or activity studies, Histidines 
have been subjected to photooxidation {19) and reaction 
with iodoacetates {20). 

B, Protein Modincation Reagents. This section will 
survey the extensive selection of reagents that are available 
for thepurpose of profein modification. The f imdamental 
principles for imdersianding how to use these reagents 
are (1) recognition of |;he reactive group(s) on the protein 
or peptide that can be modified and (2) knowledge of the 
type of chemical reactions these reactive groups will 
participate in and thej nature of the chemical bonds that 
will result from these; reactions. 

(J) Amine-Reactive Reagents. These reagents are those 
which will react primtu-ily with lysines and the a-amino 
groups of proteins and peptides under both aqueous and 
nonaqueous conditions. S ome amine-reacti ve reagents are 
more reactive, and therefore less selective, than others, 
and it wiU be necessary to understand this property in 
order to choose the best reagent for modification of a 
specific protein. Thej following amine-reactive reagents 
are available. i 

(o) Reactive Ester^ {Formation of an Amide Bond), 
Reactive esters, especially TV-hydroxysuccinimide (NHS) 
esters, are among the I most commonly used reagents for 
modification of amine groups {21), These reagents have 
intermediate reactivitly toward amines, with high selec- 
tivity toward aliphatic amines. Their reaction rate with 
aromatic amines, alcohols, phenols (tyrosine), and histi- 
dine is relatively low. Reaction of NHS esters with amines 
under nonaqueous conditions is facile, bo they are useful 
for derivatization of ^mall peptides and other low mo- 
lecular weight biomolepules. The op tiiaum pH for reaction 
in aqueous systems as 8.0-9.0. The aliphatic amide 
products which are formed are very stable (eq 4). The 



Pmtcin-fJH, + RC-O-N I 



vO 
II 



> Proteln-NHCR + HO-N 




(4) 



NHS esters are slowly hydrolyzed by water (22), but are 
stable to storage if kept well desiccated. Virtually any 
molecule that contains a carboxylic acid or' that can be 
chemically modified to contain a carboxylic acid can be 
converted into its NHS ester (eq 5). making these reagents 



4. BRINKLEY Methods for Preparn^^oteln Conjugates 



61 



o 
II 

R-COH 




(5) 



among the most powerful protein-modiiication reagents 
available. Newly developed NHS esters are available with 
sulfonate groups that have improved water solubility {23), 
A short list of reactive NHS ester derivatives of fluorescent 
probes, biotin, and other molecules is given in Table I. 

(6) Isothiocyanates {Formation of a Thiourea Bond), 
Isotliiocyanates, like NHS esters, are amine-modiflcation 
reagents of intermediate reactivity and form thiourea 
bonds with proteins and peptides (eq 6)* They are 



F^t*in-NH, + RN = C=>S 



s 

tl 

> fVotein-NHC-NHR 



(6) 



somewhat more stable in water than the NHS esters and 
react with protein amines in aqueous solution optimally 
at pH 9.0-9.5. Since this is a higher pH than the optimal 
pH for NHS esters (which undergo competing hydrolysis 
at pH 9.0-^,6), isothiocyanates may not be as suitable as 
NHS esters when modi^dng proteins that are sensitive to 
alkaline pH conditions. One of the most commonly used 
fluorescent derivatization reagents for proteins is fluo- 
rescein isothiocyanate (FITC). A number of other fluo- 
rescent dyes (coumarins and rhodainines) have been 
coupled to proteins via their reactive isotKiocyanates (24), 
(c) Aldehydes (Formation of Imine, Reduction to Alkyl- 
amineBond). Aldehyde groups react \md^r mild aqu^ua 
conditions with aliphatic and aromatic amines to form an 
intermediate known as a Schiff base (an imine), which can 
be selectively reduced by the mild reducing agent sodium 
cyanoborohydride to give a stable alkylamine'fiond (eq 7) 
{44, 63), This method of amine modification is not used 



Table I. Succinixnidyl Ester Probes 



probes 



structure 



function 



ref 



succinimidyl f1uore8cein-5-(and -6-)carboxylate 



succiniinidyl N^^'^'-tetramethybrhodamine-5- 
(and -6-)carboxylate 



succinimidyl 7-amino-4-methyIcouinariji-3-acetate 




fluorescent label 



76. 76 



fluorescent label 



76 



CH. 



fluorescent label 



77 



succiniinidyl X-rhodamine-5-(and -6-)cQrboxylate 




fluorescent label 



75,78 



succinimidyl i>>biotin 



H H 



r 

NH 



0 = C 
I 

0 



ligand» afflnity label 



79 



succinimidyl 3-(4-hydroxyphenyl) propionate 



HO 



0 
11 



CHjCHjC-C-l 



radioiodination label 



80 



( 



62 



PERSPECTIVES IN ^^VNJUGATE CHEMISTRY 



FVot«lrvf<H, RCH-O 



N«BH«CN 
> Ptotoin-N-CHR > 



Protaki-fJHCH^ (7) 



in protein conjugations as frequently as the activated ester 
method, but when the molecule to be attached has an 
aldehyde group, or can be easily converted to an alde- 
hyde, the method is mild, simple, and very effective. Al- 
dehydes (glyozals) can also react with protein axginine 
groups (25, 26) and the nucleic acid base guanosine, making 
them of some use in nucleic acid modiAcation (27). 

id) Sulfonyl Halides {Formation of a Sulfonamide 
Bond), Sulfonyl halides are highly reactive amine- 
modifying reagents. They are unstable in water, especially 
at the pH required for reaction with aliphatic amines, but 
they form extremely stable sulfonamide bonds which can 
survive even amino acid hydrolysis (eq d). It is for this 



Proteln-NH, + 



R-s-a 

II 

0 



o 

It 

Protoin-NH-S-fl 

il 
o 



HCl 



(8) 



reason that sulfonamide conjugates are useful for amine- 
terminus derivatization (Dansyl-Edman degradation) and 
as tracers (28), In addition to amines, sulfonyl halides 
also react with phenols (tyrosine), thiols (cysteine), and 
imidazoles (histidine) on proteins (29); therefore, they are 
less selective theui either NHS esters or isothiocyanates. 
The conjugates formed with thiols, imidazoles, and phe- 
nols are all unstable and, if not removed during puriH- 
cation, can lead to loss of the label from the protein during 
long-term storage (see section V.B.I). One of the most 
widely used long-wavelength fluorescent probes, Texas 
Red, is a sulfonyl chloride. It has the longest wavelength 
spectral properties of any of the common amine-reactive 
fluorescent labeling reagents {30), 

(e) Miscellaneous Amine Reactive Reagents {Dichlo- 
rotriazines, Alkyl Halides^ Anhydrides), The dichloro- 
triazine derivative of fluorescein, known as DTAF (I), has 




high reactivity with protein amines and has been used to 
prepare fluorescein tubulin with minimal loss of activity 
{31), In addition to amines, dichlorotriazines will react 
with alcohols at elevated temperatures (60-90 **C) and are 
used to prepare polysaccharide cox^ugates {32), Some alkyl 
halides, including iodoacetamides commonly used to 
modify thiols, will react with amines of proteins if the pH 
is in the range 9.0-9.5 (53). Other reagents that have been 
used to modify amines of proteins are acid anhydrides. 
Succinic anhydride is commonly used to succinylate amine 
groups of basic proteins for the purpose of changing their 
isoelectric point and other charge-related properties {34), 
Mixed anhydrides derived from reaction of a carboxylic 



acid with carbitol or 2-methylpropanol chlorof ormates (eq 
9) are excellent reagents for modification of amines under 



o o 



o o 
II II 

R-COH + CICOCH,CH(CHJ, > RCOCOCH,CM(CH J, 



0 0 
tl II 



ProtvirvNH, + RCOCOCH,CH(CH^, 



O O 

il II 
Pioteln-NHCR + HOCOCH,CH(CH,)« (9) 

mild conditions {35). Of these, the carbitol mixed anhy- 
dride is relatively water soluble afld is the preferred reagent 
for modification of amines in aqueous solution. 

(2) Thiol-Reactive Reagents, Thiol-reactive reagents 
are ihose that will couple to thiol groups on proteins to 
give thioether-coupled products.^ These reagents react 
rapidly at neutral (physiological) pH and therefore can be 
reacted with thiols selectively in the presence of amine 
groups. 

(o) Haloacetyl Derivatives {Formation of a Thioether 
Bond), These reagents (usually iodoacetamides) are 
among the most &equently used reagents for thiol mod- 
iflcation. In most proteins, t^e site of reaction is at cys- 
teine groups that are either intrinsically present or that 
result from reduction of cystines. The reaction of iodoac- 
etate with cysteine is approximately twice as fast as that 
with brompacetate and 20-100 times as rapid as that with 
chloroacetate {36), As mentioned previously, in the 
absence of cysteines, methionines can sometimes react 
with haloacetamides {12), Reaction of haloacetamides 
with thiols occurs rapidly at neutral pH at room temper- 
ature Or below, and under these conditions, most aliphatic 
amines are unreactive. In addition to proteins, haloac- 
etamides have been reacted with thiolated peptides and ' 
thiolated primers for DNA sequencing (37), and also with 
RNA (on tihiouridine) {38) . The thioether linkages formed 
from reaction of haloacetamides Eire very stable. A 
potential problem in using iodoacetamides as modification 
reagents is their instability to light, especially in solution; 
therefore, they must be protected from light in storage 
and during reaction. The fluorescein and rhodamine io- 
doacetamides are among the most intensely fluorescent 
sulfhydryl reagents available for protein and peptide 
modiflcation. 

(6) Mdleimides {Formation of a Thioether Bond) , Ma- 
leimides (eq 10) are similar to iodoacetamides in their 



NH,-Prot»*o-SH + R-N 



> NH.PtotokvS 



1 



R (10) 



application as reagents for thiol modification; however, 
they are more selective than iodoacetamides, since they 
do not react with histidine, methionine, or thionucleotides 
{39, 40), The optimum pH for the reaction of maleimides 
is near 7.0. Above pH 8.0, hydrolysis of maleimides to 
nonreactive maleamic acids can occur {41), 

(c) Miscellaneous Thiol-Reactive Reagents, These 
reagents include bromomethyl derivatives and pyridyl di- 
sulfides. The bromomethyl derivatives are similar in 
reactivity to iodoacetamides. The haloalkyl derivatives 
monobromobimane andmonochlorobimane (II) react with 



•4. BRiNKLEY Methods f Of Prepai^^^otein Conjugates 



63 



o o 



X n ci, Br 



II 



glutathione and other thiols in cells to give fluorescent 
adducts, thus providing a method of quantitation of thi- 
ols {42). Pyridyl disulfides react in an exchange reaction 
with protein thiols to give mixed disulfides (eq 11) {43). 



Prot«in-SH •¥ RS-S 




o 



ill) 



(3) Carboxylic Acid- and Aldehyde-Reactive Reagents, 
(o) Amines andHydrazides {Formation of Amide orAtkyl- 
amine Bonds). Amines and hydrazides can be coupled 
to carboxylic acids of proteins via activation of the car- 
boxyl group by a water-soluble carbodiimide followed by 
reaction with the amine or hydrazide. As mentioned 
previously (section II. A.4), the amine or hydrazide reagent 
must be weakly basic so that it .will react selectively with 
the carbodiimide-activated protein in the presence of the 
more highly basic protein c-amines (lysines). The reaction 
of these probes with carbodiimide-activated carboxyl 
groups leads to the formation of stable amide bonds (eq 
12). 



o 
II 

Ptotoin-COH + RNaC-NR' 



II R'NH, 

> ProtolivCOCoN-R > 

HNR' 

O O 
Prot«lr>-CNHH" + RNHCNHR* (12) 



Amines and hydrazides are also able to react with al- 
dehyde groups, which can be generated on proteins by 
periodate oxidation of carbohydrate residues on the 
protein. In this case, a Schiff base intermediate is formed 
(eq 13), which can be reduced to an aUsylamine with sodium 



fVot«In-glY -f NalO^ 



> FVotoiivCH 



1} RNH, 
2) N«BH«CN 
,0 > Protaln-CM/tHR 

(13) 



cyanoborohydride, a mild and selective water-soluble 
reducing agent (44) (see also section II.B.l.c). Since the 
Schiff base formation is reversible, it is possible to minimize 
formation of protein-protein products by adding a large 
excess of amine or hydrazide reagent. 

(4) Bifunctional Reagents, Bifunctional, or cross- 
linking, reagents are specialized reagents having reactive 
grpupsthat will form a bond between two different groups, 
either on the same molecule or two different molecules. 
Bifunctional reagents can be divided into two types: those 
with the same reactive group at each end of the molecule 
(homobifunctional) and those with different reactive 
groups at each end of the molecule (heterobifunctional). 
Recent trends are heavily in favor of the use of hetero^ 
bifunctional cross-linkers where the bifunctional reagent 
has two reactive sites, each with selectivity toward different 
functional groups (amine reactive and thiol reactive, for 
example). These reagents, some of which are available in 
a range of chain lengths, are well-suited to the task of 
controlled coupling of unlike biomolecules, such as two 
different proteins. Table II lists some frequently used 
heterobifunctional cross-linkers along with their reactiv- 
ities and references describing their use. 



(a) Amine Reactive — Thiol or Protected Thiol. Because 
thiols will react selectively in the presence of amines with 
a variety of reagents, these functional groups are very useful 
for attaching two different proteins together. Thiol- 
coupling methods are frequently employed to prepare 
protein-enzyme conjugates. If the proteins to be coupled 
do not contain intrinsic thiols, the procedure is typically 
carried out by introducing a single thiol group to an amine 
of one of the proteins by means of a heterobifunctional 
reagent (eq 14). Traut*s reagent (iminothiolane) has been 



Pratein(1}-NH, -f 



>*-0-CCH,CH,SCCH, 



0 0 o , 

II II ih 

Prot6ln(U-NHCCH,CH^CCH, >Protolnn) NHCCH.CH^H 



0 
II 



Prauinn)-NHCCH,CH,SH + FVotelnUhNHCCH,! 



O O 

II II 
Protolnd) NHCCH.CH,SCHtCNH-Protetn(2) 



(14) 



extensively used for the purpose of introducing thiol groups 
selectively to proteins (45, 46). Many other bifunctional 
reagents contain both an amine-reactive and a protected 
thiol group, such as succinimidyl (acetylthio)acetate 
(SATA) (47, 48) or succinimidyl 3-(2-pyridyldithio)pro- 
pionate (SPDP) (43, 49). After deprotection, the thiol- 
containing protein is then reacted with a thiol-ieactive 
group on the other protein, which has been introduced by 
a similar technique. Alternatively, proteins with synthetic 
thiol groups that have been introduced by modiHcation 
can be used to couple to a number of thiol-reactive 
derivatives of dyes, biotin^ haptens, or other molecules. 

(6) Amine Reactive — lodoacetamide^ lodoacetamides 
are primarily, thiol-reactive groups with ttie reaction 
occurring rapidly at physiological pH, but they can react 
with amines under more alkaline conditions (greater than 
pH 9.0) and long reaction times (section II.B.2.a). lo- 
doacetamides can be introduced into a protein or peptide 
that does not have intrinsic thiols via amine-reactive 
derivatives (eq 15) (50). The resulting rnodified protein 



// 0 0 

Protoin-NH, + ><-0-C(CH,),NHCCH,r 



0 0 

II II 

PrptelrvNHCICH,! ^NHCCH,! 



(15) 



• * 

can then be coupled to any th^ol-containing molecule. The 
second molecule is usually a thiol-containing protein. 

(c) Amine Reactive — Maleimide, The introduction of 
maleimides into a protein or peptide can be carried out 
with heterobifunctional reagents that have an amine- 
reactive group at one end and the thiol-specific maleim- 
ide at the other end (eq 16). The applications are very 



Prot«k>-NH, + 





64 



Table II. HeterobifoncUonal Cross-Llnkiiig Reagents 

reagent 



PERSPECTIVES I 



NJUGATE CHEMISTRY 



structure 



reactivity 



ref 



sucdnimidyl 3-(2-pyridyldithio)propionate (SPDP) 



cx 8 v, 

S-SCHjCHjCO-K 



primary amine, thiol 



49 



sucdnimidyl troru-4-(iV-ma]eimidylme1iiyl)cyclohexane- 
1-carboxylate (3MCC) 



0-C— ( )— CH,-N 



primary amine, thiol 



54,48 



sucdnimidyl (acetyithio)acetate (SATA) 



o O V-i 
II II / 

H-jCCSCH^-N 



primary amine, thiol 



47,48 



4-[(succinimid3^ozy)carboxyl]-a-methyl-a- 
(2-pyridyldithio)toluene (SMPT) 



sucdnimidyl 4- [ [ (iodoace tyl) amino] methyl] • 
cydohexane-l-carboxylate (SIAC) 




— ^^-C-s-s-^^^ primary amino, thiol 



56,48 




primary amine, thiol 



50 



ICH.CNHCH, 
^11 ' 



succinunidyl p-azidobenzoate (SAB) 




primary amine, nonselective 66 



II >-n 

Protoln-NHCCH,CH,-N 

O 



(16) 



similar to those for the iodoacetamides discussed in the 
preceding section. SpeciHc applications include coupling 
of ricin to monoclonal antibodies (5i) and linking of oli- 
gonucleotides to enzymes (52). 

(<0 Amine Reactive — Aldehyde. Aldehydes do not 
occur naturally in proteins, but can be introduced in two 
ways. In the first method, carbohydrate groups on proteins 
are treated with an oxidizing reagent, such as sodium pe- 
riodate, or are converted via a galactose oxidase/catalase 
enzyme method, both of which split the sugar to form 
aldehyde groups (53). Not all proteins contain carbohy- 
drate groups, and therefore a second method of introducing 
aldehydes via the reagent glutaraldehyde has been em- 
ployed {10), Glutaraldehyde has been used extensively to 
couple two proteins together via their amine groups (eq 
17); however, like other homobifunctional reagents, glu- 

ProtelntU-NH, + Ptoteln(2hNH, + O = CHtCH,)aCH » O > 

Proteind »-NH(CH,»»NH- Protein! 2) (17) 

taraldehyde is being replaced with more selective heter- 
obifunctional reagents such as those discussed above. 

(5) Photoactivatable Reagents, Reagents are available 
that can be activated by light (photons) to produce a 
reactive intermediate that can couple to various functional 



groups on biomolecules. Two of the most frequently used 
photoactivatable reagents for this purpose are aromatic 
azides and benzophenones, 

(a) Aromatic Azides. Aromatic azides are efficiently 
photolyzed by illumination with an ultraviolet light at 
300-350 nm. The reactive molecule produced by^his pho- 
tolysis is a liitrene, which reacts rapidly and nonspecif- 
ically with either solvent molecules or with functional 
groups on biomolecules. Almost any function^ g^'oup or 
amino acid can be modified, since the nitrene is very 
reactive. Recent improvements in azide-based protein 
modification reagents have resulted in perfluorinated 
azides that generate nitrene intermediates with greater 
stability, thus giving reagents with higher efficiency (up 
to 40%) of reaction with the protein (57, 58). One of the 
primary uses of these highly reactive reagents is to carry 
out photoaffinity labeling experiments. In these exper- 
iments, the aromatic azide is attached to a drug or other 
molecule which binds specifically to a protein binding site 
(an example is an enzyme inhibitor or a nucleotide 
analogue) and then photolyzed. The location and type of 
bond formed in this process provides information about 
the environment near the binding site (59), In addition 
to their role as photoaffinity labels, aryl azides are useful 
as heterobifunctional cross-linkers. Succinlmidyl azido- 
benzoate (SAB)* p^azidophenacyl bromide, and 4-male- 
imidobenzophenone have been employed to couple pro- 
teins through dark reaction with amines Or thiols followed, 
by light activation (56, 58, 60, 61), 

(6) Benzophenones, Senzophenones are like azides in 
that they are photoactivatable by ultraviolet light, but 



4. BRiNKLEY Methods for Prepan^^rotein Conjugates 



65 



once they have been activated, they can either react with 
functional groups or return to the ground state. Thus, 
these molecules can sometimes be reactivated if they do 
not react on the first activation. These reagents are also 
used as photoaffinity labels in a manner similar to that of 
the aromatic azides {62), 

m. PRACTICAL CONSIDERATIONS 

Along with a thorough knowledge of protein reactivity 
and the available reagents for the desired type of protein 
modification, it is of crucial importance that the researcher 
understand tihe practical aspects of carrying out reactions 
between highly reactive small organic molecules and large, 
complex, conformationally sensitive, water^luble biopoly- 
mers, The following discussion will address some of the 
general rules, problems, and pitfalls of protein-modifica- 
tion chemistry. 

A, Choosing the Right Buffer. Conjugations should 
be carried out in a well-buffered system at a pH that is 
optimal for the reaction. The ionic strength should, in 
most cases, be in the range of 2&-'100 mM. For modification 
of thiol groups and a-amino groups, which occurs selec- 
tively at physiological pH (7.0-7.5). phosphate buffers are 
ideally suited. The more strongly basic lysine amines 
require more alkaline pH, in the range of 8.0-9.5, where 
phosphate solutions do not buffer well. For these reactions, 
carbonate/bicarbonate (pH of 100 mM bicarbonate is 9.2) 
or borate buffers are quite satisfactory. As an example, 
conjugations with NHS esters are best carried out in pH 
8.2 bicarbonate buffer, while isothiocyanates require the 
higher pH (9.0-9.5) provided by carbonate or borate 
buffers. The choice of buffer will in some cases be directed 
by compatibility of the protein. 

B, Cosolvents. If the reagent that is to be attached 
to the biomolecule is readily soluble at millimolar con- 
centrations in water or buffer, no cosolyent is needed, and 
the reagent can be added as a concentrated aqueous 
solution to the buffered reaction solution. Unfortunately, 
aqueous systems are very often incompatible with the 
reagent, as a result of poor solubility or high reactivity 
with water. In these cases, a water-miscible cosolvent must 
be employed that will dissolve the reagent without causing 
its decomposition. At the same time, the cosolvent must 
not cause irreversible denaturation or precipitation of the 
biomolecule. Some cosolvents that have been successfully 
utilized in protein modifications are methanol, ethanol, 
2-propanol, 2-methoxyethanol, dioxane, dimethylforma- 
mide (DMF), and dimethyl sulfoxide (DMSO). 

The most versatile of these cosolvents are DMF and 
DMSO. They are recommended because of the following 
desirable properties: (a) they are inert to many of the 
reactive reagents used in preparing conjugates, (b) they 
are miscible with water in all proportions, and (c) they are 
compatible with most aqueous protein solutions even at 
up to 30% v/v ratios. DMF is the solvent of choice for 
reactions of sulfonyl chlorides, since these reagents will 
react with DMSO. It is usually important that cosolvents 
be ciarefully dried and stored over a drying agent to prevent 
competing hydrolysis of the reactive modification reagent. 

C, Reaction Conditions. As a general rule, conjugation 
reactions should be done at below room temperature, since 
the rate of reaction of most conjugation reagents is rapid 
at low temperature. Low temperatures tend to increase 
the selectivity of the reaction, resulting in fewer side 
reactions and more consistent and reproducible results. A 
convenient procedure is to add the reagent to a gently 
stirred buffered solution of the protein in an ice-bath and 
then allow the bath to warm to room temperature over a 



period of about 2 h. Very reactive reagents such as sul- 
fonyl chlorides ^should be reacted under more carefully 
controlled conditions, such as 4 ®C for 1 h. Stirring can 
be done with a magnetic stir-bar and should not be 
excessively fast, since proteins can be denatured by violent 
mixing. Addition of the reagent should be carried out 
dropwise and as slowly as possible, since gradual addition 
increases the selectivity of the reaction. 

(1) Protein Concentration, Because the kinetics of 
conjugation of these reagents is bimolecular, but the hy- 
drolysis rates are pseudo-first-order, dilution results in 
competition between conjugation and loss of reagent by 
hydrolysis. Protein concentrations above 10 ^lA are 
strongly recommended, with an optimum in the range of 
50-100 fiM. 

(2) pH, In modification of amines, only the unproto- 
nated form is reactive, and therefore it is necessary to 
maintain a pH at which a signiEcant number of amines 
are unprotonated. An average plCa above 9 for lysines 
indicates that the higher the pH, the better. Offsetting 
this are the factors that the rate of reagent hydrolysis 
increases rapidly above pH 9 and that proteins tend to be 
unstable at a higher pH. A free amine terminus has a pKa 
near 7 and is sometimes preferentially modiHed when the 
reaction is rup at neutral pH. An effective compromise 
in most cases is to use a pH close to 9.0-9.2 if the protein 
is stable, but a lower pH combined with more reagent and 
longer reaction times if the protein is unstable. The suc- 
cinimidyl esters and DTAF appear to react more efficiently 
at a lower pH than the isothiocyanates and sulfonyl 
chlorides. Our experience with succinimidyl esters indi- 
cates that a reaction pH of Euround 8.2 gives excellent results 
for most proteins. 

(5) Reaction Time, Usually, 1-2 h is sufficient time for 
conjugation reactions to go to completion. Longer reaction 
times, if convenient, are acceptable, since the degree of' 
labeling is generally limited by the ratio of the reagent to 
protein, rather than the reaction time. Many published 
procedures specify overnight reaction times. Obviously, 
the more reactive the reagent, the shorter the reaction 
time; sulfonyl chloride reactions are faster than NHS ester 
reactions. 

IV, FACTORS INFLUENCING CHOICE OF MOLAR 
RATIO OF REACTANTS 

A. iEnd Use of Reagents. (1) Immunogen — High 
Degree of Labeling, Protein conjugates are frequently 
prepared for use in producing specific antibodies to a drug 
or other hapten in a host animal. The drug or hapten is 
conjugated to a high molecular weight protein carrier 
molecule and injected into the animal to elicit an inunune 
response, and over a period of time, specific antibodies to 
the drug or hapten are produced. For these purposes, a , 
high degree of labeling of the protein carrier is desirable, 
since more labels generally increase the strength and 
specificity of the immune response. 

(2) Labeled Antibody or Enzyme — Low to- Moderate 
Degree of Labeling, Antibodies and enzymes eire relatively 
sensitive to substitution, since there are usually reactive 
amino acid side chains (amines, thiols, histidines) in or 
near the binding sites. For this reason, a low to moderate 
degree of labeling is preferred in order to preserve binding 
specificity or enzyme activity. Excessive labeling can also 
result in decreased solubility of the conjugates, which also 
reduces the overall activity. In the case of many fluorescent 
labels, a high dye to protein ratio causes a dramatic 
decrease in the fluorescence efficiency of the conjugates 



66 




{63, 64). In our experience with antibodies, a substitution 
ratio in the range of 4-6 is usually optimal for good 
retention of binding activity. 

(5) Fluorescent Labeled Proteins/ Peptides — Low to 
Moderate Degree of Labeling, Fluorescent labels are often 
very sensitive to their molecular environment EUid therefore 
their fluorescence intensity is almost always decreased 
when they are bound to proteins and other biomolecules. 
Fluorescence also decreases when the fluorescent labels 
are located in close proximity to one another, probably as 
a result of transfer of excited-state energy (quenching) 
from one molecule to another {65). When proteins are 
labeled with fluorescent dyes, the fluorescence increases 
as more dyes are added; at the same time, however, the 
fluorescence efHciency decreases as a result of the quench- 
ing described above. Some dyes are more sensitive to 
quenching than others. FITC is about 50-70% quenched 
on IgG at a dye/protein ratio of 6 (66), while Cascade Blue, 
a newly developed blue fluorescent dye (67), retains nearly 
100% of its fluorescence efHciency under the same 
conditions. The number of dyes that can be conjugated 
to a protein without substantial loss of fluorescence will 
depend on the size of the protein and the distance between 
the functional groups to which the label is attached. 
Usually, more dyes can be attached to a large protein than 
a small protein or peptide. A general nile for conjugates 
of fluoresceiii is 4-6 dyes/protein and for rhodamines, 2-3 
dyes/protein. The degree of labeling depends on the 
relative reactivity of the labeling reagent to the protein 
and to water, the molecular weight and nxunber of reactive 
amines on the protein, the reactant concentrations (es- 
pecially of the protein), and other factors. The exact 
amount of label to use must be determined by experiment; 
however, as a guideline> 10 mol of a typical isothiocyanate 
or NHS ester is needed to label 1 mol of a protein. Because 
of the jfaster competitive hydrolysis rate, 20 mol of a sul- 
fonyl chloride, such as Texas Red, is required to label 1 
mol of a protein. 

B. Number of Reactive Groups on the Protein. 
Proteins vary greatly in the number of reactive amino 
acid groups. For example, some proteins have 40 or more 
reactive amine groups, while others may have only one or 
two amines or thiol groups. The reactivity of these groups 
with the labeling reagent and their effective concentration 
in solution will then have an effect on the amount of 
labeling reagent required to achieve the desired degree of 
substitution. This means that small molecular weight 
proteins or peptides with few reactive groups will require 
more labeling reagent p er gram than large molecular weight 
proteins with many reactive groups. 

C. Solubility of Modification Reagent in Reaction 
Solution, (i) Cosolvent Sometimes Required. The use 
of cosolvents was explained in section in.B. In some cases 
the labeling reagent is very hydrophobic and, even though 
it is readily soluble in DMF or DMSO, it precipitates when 
added to the buffered protein solution. It is often possible 
to circumvent this problem by adding some cosolvent 
gradually, with stirring, to the buffered protein solution 
until the protein solution contains 20-25 % cosolvent. The 
ionic strength of the buffer should be no more than 60 
mM so that the buffer does not salt out upon addition of 
the cosolvent. Then the solution of labeling reagent in 
cosolvent is added so that the final volume percent co- 
solvent in the reaction mixttire is around 30%. This 
modification often is successful in preventing precipitation 
of the labeling reagent. Many proteins are stable in 30% 
DMSO or DMF; however the stability of the protein to 



PERSPECTIVES tr^^^>N JUGATE CHEMISTRY 

these conditions should be determined before carrying 
out this technique. 

(2) Two-Stage Labeling as a Last ResoH. If the 
technique described in section IV.C.l is used and the 
labeling reagent still precipitates when added to the protein 
solution, it may be possible to purify the conjugate and 
then repeat the labeling procedure to increase the degree 
of substitution. 

D, Solubility of Conjugate. (J) Conjugate Is Often 
Less Soluble Than Native Protein, Problems with 
solubOity of the conjugate can occur, most often when the 
labeling reagent is hydrophobic or contains multiple ionic 
groups. These physical properties of the label can upset 
the natural folding of the protein and cause the conjugate 
to be significantiy less soluble than the native protein {30). 

(2) Overlabeling Can Cause Precipitation of Conjugate. 
Overlabeling can produce the same undesirable results 
noted above. The best solution ^ these problems is to 
use a lower ratio of labeling reagent to protein, resulting 
in a conjugate with a lower degree of substitution. 

V. PURIFICATION OF CONJUGATES 

A. Removal of Excess Noncovalently Bound La- 
beling Reagent. (1) Dialysis — Simple, Inexpensive 
Purification Method — Inefficient for Hydrophobic Mole- 
cules. Dialysis is the simplest, but most time-consuming, 
method of piirifying protein conjugates. Not all molecules 
dialyze efficiently; the rate of dialysis depends on their 
relative affinity for the protein versus the dialysis solution. 
Molecules that are sparingly soluble in water or strongly 
adsorbed to the protein surface will take a long time to 
dialyze. Dialysis works best when the- labeling reagent 
and its unreacted byproducts are hydrophilic. When 
purifying conjugates by dial3rsis, a dialysis buffer volume 
of at least 100 times the volume of the conjugate solution 
should be used and the dialysis buffer should be changed' 
at least five times. Allow at least 4 h for dialysis between 
buffer changes. 

{2) Gel Filtration—Faster Than Dialysis— Effectively 
Removes Most Hydrophilic and Hydrophobic Labeling 
Reagents. Gel exclusion chromatography separates con- 
jugates from excess noncovalently bound labeling reagent 
and other small molecular weight imptirities by selectivly 
adsorbing the small molecules, while allowing the larger 
protein conjugate molectdes to pass through the void space 
in the gel. This method is very fast and effective for 
pvirifying conjugates from both hydrophobic and hydro- 
philic labeling reagents. A common technique employs a 
Sephadex G-25 or similar column containing about a 2- 
mL bed volume/mg of protein that can be packed in any 
suitable buffer (30). Upon elutlon in the case of dyes, the 
conjugate and free dye bands are usually clearly visible; 
many other types of labels can be visualized by holding • 
a hand-held UV lamp close to the column during chro- 
matography. Automatic fraction-collecting devices with 
UV monitors are also frequently used. If partial precip- 
itation has occurred during the reaction, the samples 
should be centrifuged before running the column. The 
solution of labeled protein wiU contain a mixture of species 
with variable degrees of substitution. If required, sepa- 
ration of the lightly and heavily labeled fractions can be 
done by ion-exchange chromatography. Usually one 
passage through a gel nitration column is sufficient to 
remove most of the unreacted label; however, some proteins 
bind small molecules with high avidity. To completely 



4 



4. BRiNKLEY Methods for Prepi^^m>rotein Conjugates 



67 



purify these conjugates it may be necessary to carry out 
additional puriHcation steps. 

(5) Hydrophobic Interaction Adsorbents — Removes 
Strongly Bound Hydrophobic Labeling Reagents, Some 
labeling reagents have a very strong £iffinity for certain 
proteins and cannot be completely removed by gel 
filtration. These conjugates c£ui be further puriHed (after 
gel titration to remove most of the unreacted label) by 
treatment with microporous, hydrophobic polystyrene 
beads (68). In this procedure, the conjugate is simply 
mixed with the beads, and the small hydrophobic molecules 
are selectively adsorbed into the micropores while the 
larger conjugate molecules are excluded. 

B. Removal of Labeling Reagent Attached by 
Unstable Covalent Bonds. (I) Hydroxylamine Treat- 
ment— Hydrolysis of Tyrosine Ester Bonds under Mild 
Conditions, Section HA.d describes the formation of ty- 
rosine esters. Several of the reagents commonly used for 
protein modification, including NHS esters, isothiocy- 
anates, and sulfonyl chlorides, can react with tyrosines to 
form these esters. These adducts are unstable and can 
hydrolyze even at physiological pH, resulting in loss of 
label over a period of time. Since any measurable loss of 
label can interfere with the intended use of many con- 
jugates, it is advisable to pretreat all conjugates prepared 
with these types of reagents to remove any esters that 
may have formed in the conjugation reaction. This can 
be effectively done in most cases by treating the conjugate 
before purification with hydroxylamine (69, 70). In this 
method, a 1.6 M solution of hydroxylamine at pH 8.0 is 
added to the conjugate solution to a final concentration 
of 0.1 M and the solution is stirred at room temperature 
for 1 h. The conjugate is then purified by gel filtration 
or dialysis. 

VI. EXPERIMENTAL METHODS FOR PREPARING 
PROTEIN CONJUGATES 

The general experimental procedures that follow de- 
scribe methods for conjugating amine-reactive and thiol- 
reactive probes to proteins. They should be useful as a 
guide for the experimentalist; however, it is strongly 
suggested that the numerous literature references given 
in this review and others be consvdted for additional specific 
, information. Because of the very wide variety of exper- 
imental conditions required for coupling proteins with 
bifunctional reagents, it is difficult to generate a simple 
general procedure and the reader is advised to consult the 
literature for specific procedures. 

A. Amine-Reactive Probes, The following general 
procedure is recommended for the first trial and is 
adaptable to amine-reactive dye, biotin, hapten, and 
bifunctional linker conjugations. The procedure may be 
modified after the degree of substitution has been deter- 
mined (see below) after purification. 

Step 1, Dissolve the protein at 50-100 in 50-100 
mM sodium bicarbonate buffer at pH 9.2 at room tem- 
perature. Borate buffer is also suitable. Amine-based 
buffers, such as TRIS are not recommended. Conjugations 
with succinimide esters and reagents such as DTAF [6- 
[(4,6-dichlorotria2in-2-yl)amino]fluorescein] should be 
done at a lower pH. In these cases, a suitable buffer is 
50-100 mM pH 8.2 sodium bicarbonate. 

Step 2. Add sufficient protein-modification reagent 
from a stock solution to contain about 10 mol of isothio- 
cyanate or succinimide ester for each mole of protein or 
about 20 mol of sulfonyl chloride for each mole of protein. 



Although most protein modification reagents have some 
solubility in water, it is recommended that a stock solution 
Tbe prepared immediately before use in a water-miscible 
nonhydroxylic solvent such as dimethyl formamide (DMF), 
dimethyl sulfoxide (DMSG), or dioxane. The stock 
solution should be prepared fresh each time, since it is 
very diffimlt to store these solutions for any length of 
time without decomposition of the reagent taking place. 
As a guideline, it is recommended to prepare a stock 
solution at about 10-20 mM of the protein-modification 
reagent in dry DMF. The fluorescent dyes Texas Red, 
Lissamine rhodamine B, and other sulfonyl chlorides must 
never be used in DMSO, with which they react These 
stock solutions (prepared in dry DMF) are usually diluted 
about 10-fold into the protein , while being agitated to avoid 
high local concentrations of reagent. Some reagents are 
quite hydrophobic, having little solubility in the aqueous 
protein solution. This is particularly true of some of the 
rhodamine and biotin succinimidyLesters. A technique 
that helps in these cases is to add a 20% volimie of DMF 
or DMSO slowly to the protein/buffer solution before 
adding the stock solution of the reagent in DMF or DMSO 
(see section IV.C.l). 

Isothiocyanates and Succinimidyl Esters. Add the 
solution of the modification reagent, dropwise using a 
microliter syringe during a period of about 1 min to the 
stirred protein solution while in an ice-water bath. Allow 
the. reaction mixture to warm to room temperature and 
continue to stir for at least 2 h. 

Sulfonyl Chlorides. Add the solution of the reagent 
quickly using a micropipet to the stirred protein solution 
in an ice bath or in a cold room. Allow to react at 4 *C 
for 1 h. 

Step 3. Separate the conjugate from unreacted dye on 
a gel filtration column using the appropriate buffer as 
described in section V. Texas Red and certain other 
rhodamine-based conjugates will still retein varying 
amounts of noncovalently adsorbed dye even after puri- 
fication by gel chromatography. This protein-adsorbed 
dye can be removed by treating the conjugate with a 
hydrophobic adsorbent as described in section V.A.3. 

B. Thiol-Reactive Probes. A general procedure 
Buiteble for conjugation of thiol-reactive probes, including 
maleimides, iodoacetotes, and alkyl halides, is outlined 
below. As a rule, thiol-reactive reagents are more steble 
to water than the reactive esters; however, they should be 
handled carefully and stored in a freezer with protection 
from light and moisture. As with the reactive esters and 
isothiocyanates discussed above, only freshly prepared 
reagent solutions should be used. Protection from light 
is particularly important for iodoaCetamides. 

Step 1. Dissolve the protein at 50-100 mM in a suitable 
buffer at pH 7.0-7.5 (10-100 mM phosphate, TRIS, 
HEPES) at room temperature. At this pH range, the 
protein thiol groups are sufficiently nucleophilic so that 
they react almost exclusively with the reagent in the 
presence of the more numerous protein amines, which are 
protonated and relatively unreactive. As a general rule, 
it is advisable to carry out thiol modifications in an oxygen- 
free environment, since some thiols can be oxidized to 
disulfides. This is particularly important if the modifi- 
cation reagent is to be reacted with a cystine group that 
has been previously reduced with a reagent such as dithio- 
threitol. In this case, all buffers should be deoxygenated 
and the reactions carried out under an inert atmosphere 
to prevent re-formation of disulfide. 



68 



PERSPECTIVES IN JUGATE CHEMISTRY 



Step 2. Add sufficient protein modification reagent 
from a stock solution of the reagent to contain 10-20. mol 
of reagent for each mole of protein. If the reagent is water- 
soluble, an aqueous solution can be used; otherwise, the 
reagent can be dissolved in one of the watei-miscible non- 
hydroxylic solvents recommended for use with amine- 
reactive reagents. The reagent concentration should be 
about 10-20 mM. Upon completion of the reaction with 
the protein, an excess of glutathione^ mercaptoethanol, or 
other soluble low molecular weight thiol can be added to 
consume excess modification reagent, thus ensuring that 
no reactive species are present during tbe purification step. 

lodoacetamides. Reactions with iodoacetamides should 
be caried out in the dark, since light can cause reagent 
decomposition. Add the stock reagent solution dropwise 
and slowly to the gently stirred solution of the protein at 
room temperature over a period of about 1 min. Ck>ntinue 
stirring for 2 h. 

Maleimides, Reaction conditions are essentially the 
same as with iodoacetamides; however, the selectivity of 
maleimides toward thiol groups is greater, allowing some- 
what more latitude in the buffer pH. Decomposition to 
maleamic acids above pH 8.0 is a competing reaction. Add 
the stock reagent solution dropwise and slowly to the gently 
stirred protein solution at room temperature over a period 
of about 1 min and allow the mixture to react for 2 h. 

Step 3. Separate the conjugate from unreacted mod- 
location reagent as described in section V. 

C. Storage of Conjugates. Conjugates should be 
stored as one normally stores the parent protein. If the 
protein is stable to freezing, then lyophilization is rec- 
ommended for long term storage. Sodium azide at 2 mM 
or thimerosal may be added to inhibit bacterial growth. 
CAUTION: These preservatives may be toxic in live-cell 
use of conjugates. In addition, sodium azide is an inhibitor 
of the enzyme horseradish peroxidase (HRP). Therefore, 
thimerosal should be substituted as a preservative in 
situations where the conjugate is derived from HRP or it 
is anticipated that the conjugate will be used in the 
presence of HRP. Fluorescent dye conjugates should be 
protected from light. 

VII. DETERMINATION OF THE DEGREE OF 
SUBSTITUTION OF PROTEIN CONJUGATES 

Several methods are available for determining the degree 
of substitution of modified proteins. If the modification 
results in the creation of thiol residues, as is often the case 
with bifunctional reagents, it is relatively straightforward 
to determine the degree of substitution by quantitation 
of thiols. Several colorimetric methods for thiol deter- 
mination are available (43, 45, 47), Maleimides introduced 
into proteins can be determined by back-titration with 
2-mercaptoethanol {81). Dyes and many other types of 
molecules introduced into proteins are usually determined 
by spectroscopic techniques, as described below. 

This general procedure should be applicable to dyes 
and other molecules that have significant absorption above 
-280 nm. 

The determination of dye/protein (D/P) levels by 
spectroscopy is accomplished by determining the apparent 
concentration of dye in the conjugate by measuring its 
absorption at its characteristic Xjata and then measuring 
the protein concentration of the conjugate by its absorption 
at 280 nm. Because most dyes have some absorption at 
280 nm, the absorption of the conjugate at 280 nm must 
be corrected for the contribution of the dye to obtain the 
correct protein concentration. The ratio of these two 



concentrations, calculated by use of Beer's law (A = eC/, 
where e = extinction coefficient, A = molar absorbance, 
C = molar concentration, and / = path- length), is then 
equal to the D/P ratio. 

This method is inexact, because there is no way, to know 
precisely how the spectral characteristics of the dye change 
when it is conjugated to the protein. The following 
assumptions and approximations are made. 

(1) The extinction coefficient of the protein-bound dye 
at its absorption maximum is about the same as the 
extinction coefficient of the free dye in solution at its 
absorption maximim:i. Although there are undoubtedly 
some differences, experiments have shown that this 
assumption is at least approximately correct (64). 

(2) The absorption of the protein-boimd dye at 280 nm 
is about the same as the absorption of the free dye in 
solution. This assumption may be less reliable than the 
previous assumption, since there is probably more con- 
tribution firom the linking group to this portion of the 
spectrum, and this group can be substantially changed 
when attached to the protein. The following question 
arises: what is the **free dye"? There is no unambiguous 
answer to this question, since the dye, when attached to 
the protein, is different than the free dye, and the spectral 
properties will be somewhat different. The best choice of 
free dye if the NHS ester was used as the reagent is 
probably the free acid or lysine amide derivative. These 
may be available or can be synthesized. Do not use the 
NHS ester as the free dye, since the N-succinimidyl group 
absorbs strongly at 280 nm. In other cases, sulfonic acids 
can be used when the protein modiHcation reagent was a 
sulfonyl chloride. 

(3) The extinction coefHcient of the conjugate at 280 
nm is about the same as the extinction coefHcient of the 
native protein. However, extensive modification of the 
protein may change the spectral absorption at 280 nm in 
an unknown manner. 

Although there are obvious questionable assumptions, 
spectroscopy remains the easiest and most convenient 
method of determining D/P ratios. One alternative is to 
determine the protein , concentration by weighing the 
conjugate, which eliminates problems in assumption 3, 
but this is tedious and includes the danger that the 
conjugate will denature when dried without buffer, or the 
lyophilized conjugate may contain entrapped buffer salts. 
This method does not eliminate errors from assumptions 
1 and 2. Another alternative is to digest a known amount' 
of the conjugate chemically or with a proteolytic enzyme 
to degrade the molecule to small fragments containing 
the dye and then determine the concentration of the dye 
by spectroscopy. This is even more tedious and still does 
not usually give a pure dye product which can be compared 
spectrally with a known derivative. Becaxise of the lack 
of convenient and suitable alternatives, direct spectro- - 
scopic determination is the most frequently used method 
of estimating D/P ratios {64, 71-74), 

Procedure. Step J. Obtain absorption spectra of the 
free dye and the dye-protein conjugate (note 1). 

Step 2. Obtain extinction coefficients of the free dye 
and protein from a handbook of dyes and protein tables 
(5, 50), 

Step 3. Perform these calculations: 



4. BRINKLEY Methods for Prepan^^rotein Conjugates 



69 



Cp = [A280 - iA^)Vt^ 
D/P = Ca/Cp 

where ea is the extinction coefficient of free dye at Xmaz* 
is the absorbance of free dye at Xmoxi >^d(280) is the ab- 
sorbance of free dye at 280 nm, is the absorbance of 
dye in conjugate at Xjou, Cp Is the extinction coefficient of 
protein , at 280 nm, A280 is the absorbance of protein in 
conjugate at 280 nm, Cd is the concentration of dye in 
conjugate (mol/L), and Cp is the concentration of protein 
in conjugate (mol/L). 

ACKNOWLEDGMENT 

I thank Dr. Rosaria Haugland and Danuta Szalecki for 
helpful discussions and advice concerning experimental 
details. I also thank Nan Minchow for preparing the 
structiures and tables. 

LITERATURE CITED 

(1) <a) Means, G. and Feeney, R. £. (1971) Chemical 
Modification of Proteins, Holden-Day, San Fiancisco, CA. 
(b> Means, G. E., and Feeney, R. £. (1990) Chemical Modi- 
fication of Proteins: History and Applications. Bioconjugate 
Chem, 1, 2. 

(2) Glazer, A. N., Delange. R. J., and Sigman, D. S. (1975) 
Chemical Modification of Proteins, Laboratory Techniques 
in Biochemistry and Molecular Biology (T. S. Work, and E. 
Work, Eds.) American Elsevier Publishing CJo., New York. 

(3) Lundblad, R. L., and Noyes, C. M. (1984) Chemical Reagents 
for Protein Modification, Vols. I and 11, ORG Press, New York- 

(4) Pfleiderer, G. (1985) Chemical ModiHcation of Proteins. In 
Modern Methods in Protein Chemistry (H. Tschesche, Ed.) 
Walter DeGryter, Berlin and New York. 

(5) Eyzaguirro, J. (1987) Chemical Modification of Enzymes, 
Active Site Studies. John Wiley & Sons, New York. 

(6) Wong, S. H. (1991) Chemistry of Protein Conjugation and 
Cross-linking, CRC Press, Boca Raton, FL. 

(7) De Lange, R. J., and Huang, T. S. (1971) Egg white avidin. 
in. sequence of the 78-re8idue middle cyanogen bromide pep- 
tide. Complete amino acid sequences of the protein subunit. 
J. Biol. Chem. 246, 698. 

(8) Fasman, G. D,, Ed. (1989) Practical Handbook of Biochem- 
istry and Molecular Biology, p 13, CRC Press, Boca Raton, 
FL. 

(9) White, A., Handler, P., and Smith, E. L. (1982) Principles 
of Biochemistry, p 142, McGraw-Hill, New York. 

(10) (a).K6rn, A. H., Feairheller, S. H., and Filachione, B. M. 
(1972) Glutaraldehyde: nature of the reagent. J, Mol. BioU 
66, 526. (b) Hardy, P. M., Nicholls, A, C, and Rydon, N. H. 
(1976) The nature of the crosslinking of proteins by glutaral- 
dehyde. Parti. Interaction of glutaraldehyde with the amino 
group of ^-aminohexanoic acid and of a-N-acetyl-lysine. J, 
Chem. Soc. Perkin Trans. 1, 958. 

(11) Cleland, W. W. (1964) Dithiothreitol, a new protective 
reagent for SH groups. Biochemistry 3, 480. 

(12) Gundlach, H, G-, Moore, S., and Stein, W. H. (1959) The 
reaction of iodoacetate with methionine. J. Biol. Chem. 234, 
1761. 

(13) Riordan, J. P., and Vallee, B. L. (1972) Diazonium salts as 
specific reagents and probes of protein configuration. Methods 
Enzymol. 25, 261. 

(14) Wilchak, M., Ben-Hur, H., and Bayer, E. A. (1966) p-Di- 
azobenzoyl biocytin — A new biotinylating reagent for the 
labelling of tyrosines and histidines in proteins. Bioch'em. 
Biophys. Res. Commun. 136, 872. 

(15) Hoare, D. G., and Kosbland, D. E.. Jr. (1966) A procedure 
for the selective modification of carboxyl groups in proteins. 
J. Am. Chem. Soc. 88, 2087. 

(16) Yamada, H., Imoto. T., Fujita, K., Ozaki, K., and Motomura, 



M. (1981) Selective modification of aspartic acid 101 in 
lysozyme by carbodiimide reaction. Biochemistry 20, 4836. 

(17) Renthal, R, Cothran, M., Dawson, N., and Harris, G. J. 
(1987) Fluorescent labeling of bacteriorhodopsin: impHcations 
for helix connections, Biochim. Biophys. Acta 897, 384. 

(18) Yankeelov, J. A., Jr., Mitchell, C. D., and Crawford, T. H. 
(1968) A simple trimerization of 2,3-butanediones yielding a 
selective reagent for the modification of arginine in proteins. - 
J. Am. Chem, Soc. 90, 1664. 

(19) Bond. J. S., Francis, S. H., and Park, J. H . (1970) An essential 
hisddine in the catalytic activities of 3-phoaphoglyceralde- 
hyde dehydrogenase. J. Biol. Chem. 245, 1041. 

(20) Stark, G. R., Stein, W. H., and Moore, S. (1981) Relationships 
between the conformation of ribonuclease and its reactivity 
toward iodoacetate. J. Biol. Chem '. 236, 436^ 

(21) Bragg, P. D., and Hou, C. (1975) Subunit composition, 
function and spatial arrangement in ^ the Ca^ and Mg^- 
activated adenosine triphosphatases of Escherichia coli and 
Salmonella typhimurium. Arch. Biochem. Biophys. 167, 311, 

(22) Ifomants, A. J., and Fairbanks, G. (1976) Chemical probes 
of extended biological structures: synthesis and properties of 
the cleavable protein cross-linking reagent [^]dithiobis(Buc- 
cinimidyl propionate). J, Mol. Biol. 104, 243. 

(23) Staros, J. V. (1982) iV-hydroxysulfosuccinimide active esters: 
Bis(iV-hydroxysulfosuccinimide) esters of two dicarboxylic 
acids are hydrophilic, membrane-impermeant protein cross- 
linkers. Biochemistry 21, 3950. 

(24) Brantzaag, P. (1975) Rhodamine conjugates: specific and 
non-specific binding properties in immunohisto chemistry. 
Ann. N.Y. Acad, Sci. 254, 35. 

(25) Takihashi, K. (1968) The reaction of phenylglyoxal with 
arginine residues. J. Biol. Chem. 243, 6171. 

(26) Konishi, K., and Fujiuka. M. (1987) Chemical modification 
of a functional arginine residue of rat liver glycine methyl- 
transferase. Biochemistry 26, 8496. 

(27) Wagner, R., and Gassen, H. G. (1976) On the covalent 
binding of mRNA models to the part of the 16S RNA which 
is located in the mRNA binding site of the SOS ribosome. £10- 
chem. Biophys. Res. Commun. 65, 519. 

(28) Gray, W. R. (1967) Sequential degradation plus dansyla- 
tion. Methods Emymol. 11, 469. 

(29) Hartiey, B,, and Maasey, V. (1956) The active center of 
chymotrypsin I. Labelling with a fluorescent dye. Biochim. 
Biophys. Acta 21 ^ 58. 

(30) Titus, J., Haugland. R., Sharrow, S. O., and Segal, D. M. 
(1982) Texas Red, a hydrophilic, red-emitting fluorophore for 
use with fluorescein in dual parameter flow microfliiorometric 
and fluorescence microscopic studies. J, Immunol. Methods 
50, 193. 

(31) Wadsworth, P., and Salmon, E. (1986) Preparation and 
characterization of fluorescent cmalogs of tubulin. Methods 
Enzymol 134, 519. 

(32) De Belder, A. N., and Grahath, K. (1973) Preparation and 
properties of fluorescein labeled dextrans. Carbohydr. Res. 
30, 375. 

(33) Gurd, F. R. N. (1967) Carboxymethylation. Methods En- 
zymol. 11, 532. 

(34) Shiao, D. D. F., Lumry, R., and Rajender, S. (1972) 
Modification of protein properties by change in charge. Eur. 
J; Biochem. 29, Zll. 

(35) Singh, P. (1977) Carbarn azepine antigens and antibodies. 
U.S. Patent 4,058,511. 

(36) Lundblad, R. L., and Noyes, C. M, (1984) Chemical Reagents 
for Protein Modification, Vol. I, p 55, CRC Press, New York. 

(37) Ansorge, W. (1988) Non-radioactive automated sequencing 
of oligonucleotides by chemical degradation. Nucleic Acids 
Res. 16, 2203. 

(38) Johnson, A. £., Adkins, H. J., Matthews, B. A., and Cantor, 
C. R. (1962) Distance moved by transfer RNA during trans- 
location from the A site to the P site on the ribosome. J. Mol. 
Biol. 156, 113. 

(39) Smyth, D. G., Blumenfeld, O. O., and Konigsberg, W. (1964) 
Reaction of N-ethylmaleimide with peptides and amino acids. 
Biochem. J. 91, 589. 



70 



PERSPECTIVES IN ^^»JUGATE CHEMISTRY 



(40) Brown, R. D., and Matthews* IC S. (1979) Chemical 
modification of lactose repressor proteins using N-substituted 
maleimides. J. Biol. Chem, 254 ^ 6128. 

(41) Ishi, S. S., and Lehrer, J. (1966) Effects of the state of the 
sucdnimido-ring on the fluorescence and structural properties 
of pyrene maleimide labeled alpha-tropomyosin, Biophys, J. 
SO, 75. 

(42) Kosower, N. S. (1979) Bimane fluorescent labels: labeling 
of normal human red cells under physiological conditions. 
Proc. Natl, Acad. Sci. U.S.A. 76, 3382. 

(43) Carlsson, J., Drevin, H., and Axon, R. (1978) Protein thi- 
olation and reversible protein-protein conjugation. iV-suc- 
cinimid>i 3-(2-pyridyldithio)propionate, a new heterobifunc- 
tional reagent. Biochem, J. 173, 723. 

(44) Jentoft, J. E., and Dearborn, P. G. (1979) Labeling of proteins 
by reductive methylation using sodium cyanoborohydride. J, 
Biol. Chem. 254, 4359. 

(45) Jue, R., Lambert, J. M., Pierce* L. R., and Traut, R. R. 
(1978) Addition of sulfhydryl groups to Escherichia coli ri- 
bosomes by protein modification with 2-iminothiolane (meth- 
yl 4-mercaptobutyrimidate). Biochemistry 17, 6399. 

(46) McCall, M. J., Dlril, H., and Meares, C. F. (1990) Simplified 
method for conjugating macrocyclic bifuncUonal chelating 
agents to antibodies via 2-iminothiolane. Bioconjugate Chem. 
i, 222. 

(47) Julian, R. (1983) A new reagent which may be used to 
introduce sulfhydryl groups into proteins, and its use in the 
preparation of conjugates for Immunoassay. Anal. Biochem. 
132, 68. 

(48) Ghetie, V„ Till, M. A., Ghetie, M., Tucker, T., Porter, J., 
Patzer, E. J., Richardson, J. A., Uhr, J. W., and Vitetta, A. 
(1990) Preparation and characterization of conjugates of 
recombinant CD4. and deglycosylated ricin A chain using 
different cross-linkers. Bioconjugate Chem. 1, 24. 

(49) Cumber, J. A,, Forrester, J. A., Foxwell, B. M. J., Ross, W. 
C. J., and Thorpe, P. R (1985) Preparation of antibody-toxin 
conjugates. Methods Enzymol. 112, 207. 

(50) Haugland, R. P. (1989) Handbook of Fluorescent Probes 
and Research Chemicals, p 54, Molecular Probes, Inc., EHigene, 
OR. 

(51) Youle, R. J., and Neville. D. M. (1980) Anti-Thy 1.2 mon- 
oclonal antibody linked to ricin is a potent cell-type specific 
toxin. Proc. Natl. Acad, Sci. U.S,A. 77, 5483, 

(52) Ghosh, S. S., Kao, P. N., McCue, A. W.. and Chappelle, H. 
L. (1990) Use of maleimide-thiol coupling chemistry for 
efficient synthesis of oligonucleotide-enzyme conjugate hy- 
bridization probes. Bioconjugate Chem, i, 71. 

(53) (a) Komatsu, S. K., Devries, A. L., and Feeney, R, E. (1970) 
Studies of the structure of freezing point-depressing glyco- 
proteins from an Antarctic fish. J, Biol, Chem. 245, 2909i (b) 
Vanderheede, J., Ahmed^ A. L, and Feeney, R. E. (1972) 
Structure and role of carbohydrate in freezing points epressing 
glycoproteins from an Antarctic fish. J. Biol, Chem. 247, 7885. 

(54) Freytag, J. W. (1984) Affinity column-ifiediated immimoen- 
zymetric assays: infiuence of affinity column ligand and va- 
lency of €intibody^enzyme conjugates. Clin. Chem, 30, 1494. 

(55) Thorpe, P. E. (1987) New coupling reagents for the synthesis 
of immunotoxins containing a hindered disulfide bond with 
improved stability in vivo. Cancer Res, 47, 5924. 

(56) Ji, I., and Ji, T. H. (1981) Both a and /3 subunits of human 
chorionic gonadotropin photoaffinity label the hormone re- 
ceptor. Proc. Natl. Acad. Sci. U.SA. 78, 5465. 

(57) Keana, J. F. W., and Cai, S. X. (1989) functionalized per- 
fluorophenyl azides: New reagents for photoaffinity labeling. 
J. Fluorine Chem. 43, 151. 

(58) Crocker, P. J., Imai, N., Rajagopalan, K., Boggess, M. A., 
Kwiatkowski, S., Dwyer, L. 0,, Vanaman, T. C, and Watt, D. 
S. (1990) Heterobifunctional cross-linking reagenta incorpo- 
rating perfluorinated aryl azides. Bioconjugate Chem, 1, 419. 

(59) Batra. S. P., and Nicholson, B. H. (1982) 9-Azidoacridine, 
a new photoaffinity label for nucleotide and aromatic binding 
sites in proteins. Biochem. J. 207, 101. 

(60) Hixson. S. H., and Hixson, S. S. (1976) p-Azidophenacyl 
bromide, a versatile photolabile bifunctional reagent. Reaction 



with glyceraldehyde-3-phosphate dehydrogenase. Bio- 
chemistry 14, 4251. 

(61) Bayley, H. (1983) Photogenerated Reagents in Biochemistry 
and Molecular Biology. Elsevier, New York. 

(62) Tao. T., Lamkin, M.. and Schemer, C. (1984) Studies on the 
proximity relationships between thin filament proteins using 
benzophenone-4-maleimide as a site-specific photoreactive 
crosslinker. Biophys. J. 45, 261. 

(63) Valdea-Aguilera. O., and Neckers. D. C. (1989) Aggregation 
phenomena in xanthene dyes. Acc. Chem. Res. 22; 171. 

(64) Midoux, P., Roche, A. C, and Monsigny, M. (1987) Quan- 
titation of the binding, uptake, and degradation of fluores- 
ceinylated neoglycoproteins by flow cytometry. Cytometry 8, 
327. » 

(65) Stryer, L., and Haugland, R P. (1967) Energy transfer A 
spectroscopic ruler. Proc. Natl. Acad. Sci. UJS.A.,58, 719. 

(66) Zuk, R. F., Rowley, G. L.. and Ulunan, E. F. (1979) 
Fluorescence protection immimoassay: A new homogeneous 
assay technique. Clinical Chem. 25, 1554. 

(67) WhiUker, J. E., Haugland, R. P.. Moore, P. L,, Hewitt, P. 
C, Reese, M., and Haugland, R. P. Cascade Blue derivatives: 
water soluble, reactive, blue emission dyes evaluated as 
fiuorescent labels and tracers. Anal. Biochem. In press. 

(68) Spack, E. G., Jr., Packare. B., Wier, M. Li, and Edidin, M. 
(1986) Hydrophobic adsorption chromatography to reduce 
nonspecific staining by rhodamine-Iabeled antibodies. Anal. 
Biochem. 158, 233. 

(69) Carraway, K, L., and Koshland, D. R, Jr. (1968) Reaction 
of tyrosine residues in proteins with carbodiimide reagents. 
Biochem. Biophys. Acta 160, 272. 

(70) Smyth, D. G. (1967) Acetylation of amine and tyrosine hy- 
droxyl groups. J. Biol. Chem. 242, 1692. 

(71) Van Dalen, J. P. R., and Haajhnan, J. J. (1974) Determi- 
nation of the molar absorption coefficient of bound tetram- 
ethylrhodamine isothiocyanate relative to fiuorescein isothio- 
cyanate. J. Immunol. Methods 5, 103. 

(72) Wessendorf, M. W., Tallaksen-Greene, S. J., and Wohl- 
hueter, R. M. (1990) A spectrophotometric method for de- 
termination of fiuorophore-to-protein ratios in conjugates of 
the blue fiuorophare 7-amino-4-methylcoumarin-3-acetic acid 
(AMCA). J. Histochem. Cytochem, 38, 87, 

(73) Guar, R K., and Gupta, K. C. (1989) A spectrophotometric 
method for the estimation of amino groups on polymer 
supports. Anal, Biochem. 180, 253. 

(74) Srivastava, P. C, Buchsbaum, D. J. , Allred, J. F., Brubaker, 
P. G., Hanna, D. E., and Spiker, J. K. (1990) A new conjugating 
agent for radioiodination of proteins: Low in vivo deiodina- 
tion of a radiolabeled antibody in a tumor model. Biotech" 
niques 8, 536. 

(76) Vlgers, G. P. A., Cove, M., and Mcintosh. J. R. (1988) 
Fluorescentmicrotubules breakup under illumination. J. Cell. 
; Biol. 107, 1011. 

(76) Kellogg, D. R., Michison, T. J., and Alberts, B. M. (1988) 
Behavior of microtubules and actin filaments in living Droso- 
phila embryos. Development 103, 675. 

(77) Khalfan, H. (1986) Aminomethylcoiunarin acetic acid: a 
new fiuorescent labelling reagent for proteins. Histochem. J. 
18, 497. 

(78) Gorbsky, G. J., Sammak, P. J,, and Borisy, G. G. (1988) 
Microtubule dynamics and chromosome motion visualized in 
living anaphase cells. J. Cell. Biol. 106, 1185. 

(79) Hoffman, K., Finn, F., and Kiso, Y. (1978) Avidin-biotin 
affinity columns. General methods for attaching biotin to pep- 
tides and proteins. J. Am. Chem. Soc. 100, 3585. 

(80) Bolton, A. E., and Hunter, W. M. (1973) The labelling of 
proteins to high specific radioactivities by conjugation to a 
"^I-containing acylating agent. Application to the radioim- 
munoassay. Biochem. J. 133, 629. 

(81) Duncan, J. S., Weston, P. D„ and Wrigglesworth, R. (1982) 
A new reagent which may be useful to introduce sulfhydryl 
groups into proteins, and its use in the preparation of conjugates 
for immunoassay. Anal. Biochem. 132, 68. 





Exhibit 7 



BIlXTHrMlCA ET BIOPHYSICA ACTA 

BBa 

www .elsevier.com/locate/bba 

Review 

Protein engineering of subtilisin 

Philip N. Bryan * 

Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute. 9600 Giidelsky Drive, 

Rockville, MD 20S50. USA 

Received 21 March 2000; received in revised form 17 August 2000; accepted 28 September 2000 



Abstract 

The serine protease subtilisin is an innportant industrial enzyme as well as a model for understanding the enormous rate 
enhancements affected by enzymes. For these reasons along with the timely cloning of the gene, ease of expression and 
purification and availability of atomic resolution structures, subtilisin became a model system for protein engineering studies 
in the 1980s. Fifteen years later, mutations in well over 50% of the 275 amino acids of subtilisin have been reported in the 
scientific literature. Most subtilisin engineering has involved catalytic amino acids* substrate binding regions and stabilizing 
mutations. Stability has been the property of subtilisin which has been most amenable to enhancement, yet perhaps least 
' understood. This review will give a brief overview of the subtilisin engineering field, critically review what has been learned 
i about subtilisin stability from protein engineering experiments and conclude with some speculation about the prospects for 
I future subtilisin engineering. © 2000 Elsevier Science B.V. All rights reserved. 

i 
I 

j Keywords: Folding; Stability; Site-directed mutagenesis; Design; Directed evolution 

t 

1. Overview 

In March of 1985, the first UCLA Symposium on 
Protein Structure, Folding and Design convened in 
Keystone Colorado [105]. The atmosphere reflected a 
distinct giddiness among many of us about the pros- 
pects of the newly anointed field of 'Protein Engi- 
neering' [170]. The meeting was timely because in 
the early 1980s a number of technical breakthroughs 
came together which enabled the introduction of spe- 
cific mutations into a gene, heterologous expression 
of the altered protein, and relatively rapid assessment 
of the structural consequences of the mutations 
by X-ray structure determination. In the keynote 



* Fa.\; +1 301 738 6255; E-mail: bryan@umbi.umd.edu 




Biochimica et Biophysica Acta 1543 (2000) 203-222 



address, however. Frederick Richards of Yale Uni- 
versity asserted that while site-directed mutagenesis 
was fun, it was really just the next phase of chemical 
modification and unlikely to revolutionize under- 
standing of protein folding and enzymology. After 
15 years and thousands of site-directed mutants, it 
probably can be said that a good time has been had 
by all. But given the perspectives of time and expe- 
rience, what has been accomplished from protein en- 
gineering? This review will give a brief overview of 
the subtilisin engineering field, critically review what 
has been learned about subtilisin stabihty from pro- 
tein engineering experiments and conclude with some 
speculation about the prospects for future subtilisin 
engineering. 

Mutations in well over 50% of the 275 amino acids 
of subtilisin have been reported in the scientific liter- 
ature (Table 1). Many more examples exist in the 



0167-4838/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. 
^11:50167-4838(00)002 3 5-1 



204 



P.N. Bryan { Biochimica et Biophysica Acta 1 543 (2000) 203-222 



patent literature and undoubtedly still more lurk 
unfathomed in the freezers of biotechnology compa- 
nies. Subtilisins constitute a large class of microbial, 
serine proteases, but the ones most mutagenized are 
those secreted from the Bacillus species amyloliquefa- 
ciem (BPN'), subtilis (subtilisin E) and lentus (savi- 
nase). Subtilisins are important industrial enzymes as 
well as models for understanding the enormous rate 
enhancements affected by enzymes. For these rea- 
sons, along with the timely cloning of the gene, 
ease of expression and purification and availability 
of atomic resolution structures, subtilisin became a 
model system for protein engineering studies. 

Protein engineering of subtilisin commenced in the 
mid 1960s when the active site serine 221 was con- 
verted to cysteine through chemical modification 
[101,119]. As it turned out, this first alteration re- 
mains one of the most useful. C221 subtilisin is cata- 
lytically wounded to the point that it will barely hy- 
drolyze peptide bonds but turns out to be quite 
reactive with certain activated ester substrates 
[115,116]. This combination of properties has made 
it a useful tool for catalyzing synthetic reactions. 
These include condensation of amino acids to form 
peptides and transesterification reactions such as re- 
gioselective acylation of sugars [83,98.187.188]. 

The first genetic modifications in subtilisin oc- 
curred rapidly after the gene was cloned in the early 
1980s [72,171,182]. The early standard for genetic 
manipulation was subtilisin BPN', which was engi- 
neered for stability [26,47,183], catalytic mechanism 
[20,168,180] and substrate specificity [46]. The ration- 
ales for modifying subtilisin have expanded over the 
years to include the following eight broad classifica- 
tions: 

(1) Catalytic mechanism: [15,20.31.32,36,41,97, 
101,102,104,119-121,129,130,147,148.168.169,178,180, 

185]. 

(2) Substrate specificity: [5,6,8.9.28-30.38^0,46, 
56^58,85,89-91,94,122,123,144,155,156.161.163-165, 

167,179,181,184]. 

(3) New activities: [1,3,10,11,60-63.79.114,117, 
134,152,193]. 

(4) General proteolytic activity: [54.77.153,154, 
157,159]. 

(5) General stability: [4,16,22.23, :5-:7.34,35,45, 
48,53,65,74,75,78.80,95,96,99,100.107.110-112,124, 

132,145,158,160,166,183,190,191.194]. 



(6) Stability in exotic environments: [33,47,55,109 
149,174,186], 

(7) Surface activity: [17,18,44,69]. 

(8) Folding mechanisms: [19,21,24,42,43,49-51 
67,68,73,82,86-88,127,128,131,133,138-142,150,151/ 
172,176,177]. 

Most subtilisin engineering continues to involve 
catalytic amino acids, substrate binding regions and 
stabilizing mutations. Included in the active site cat- 
egory are mutations of the catalytic triad (D32, H64, 
S221), the oxyanion hole (N155) and mutations 
which influence ^K^ of H64 through long range elec- 
trostatics. Most mutations affecting specificity have 
been made in the binding pockets SI and S4 [12], 
The SI amino acids comprise positions 127, 152, 
154, 156 and 166 and the S4 amino acids comprise 
positions 102, 104, 107, 126 and 128. A excellent re- 
view of the use of protein engineering to understand 
catalytic mechanism and substrate specificity ap- 
peared in 1995 [113]. 



2. Subtilisin stability 

Stability has been the property of subtilisin which 
has been most amenable to enhancement, yet per- 
haps least understood. Rationalizing stability in- 
creases resulting from mutation in structural and en- 
ergetic detail is limited by the inability to study the 
folding reaction under equilibrium conditions. The 
most basic protein stability experiment is determin- 
ing the free energy of unfolding [70,162]. This ques- 
tion is still not resolved for subtilisin. Biosynthesis of 
subtilisin requires participation of an N-terminal 
prodomain [71], The folding rate of mature subtilisin 
without the prodomain occurs on a time scale closer 
to geological than biological. By combining biochem- 
ical analysis with information from mutagenesis ex- 
periments, however, one can now make an informed 
estimate of the free energy of folding mature subtili- 
sin and use this information to better evaluate stabi- 
lizing mutations. 

2,/. Energetics of the subtilisin folding reaction 

2, 1. 1. Calcium binding 

A fundamental variable to address in subtilisin 
stability is its colossal calcium dependence [52,175]. 




7} 



55,109. 



,49^51 
50.l5l| 

involve 
'ns and 
ite cat. 
2, H64. 
stations 
ge eiec- 
ty have 
54 [12]. 
7, 152, 
)mprise 
lent re- 
erstand 
ity ap- 




P,N. Bryan! Biochimica et Biophysica Acta J 343 (2000) 203-222 



205 



1 which 
/et per- 
lity in- 
and en- 
udy the 
ns. The 
•termin- 
is ques- 
hesis of 
enninal 
ubtilisin 
e closer 
iochem- 
esis ex- 
1 formed 
subtili- 
e stabi- 



ibtilisin 
52,1751- 









Table 1 {continued) 




C- — 


BPN' 


Mutatinn 


No. 


BPN' 


Mutation 




A 


C W/C78 [1081 


52 


P 






Q 


R [23]: E. R, W [149] 


53 


S 


T [124] 


V 3 


s 


C W/C206M75-83 [149]; T [3] 


54 


E 




V 


I [53] 


55 


T 




'y': 5 


p 


A. S W/A75-83 [149] 


56 


N 




• 6 


Y 




57 


P 




7 


G 




58 


F 




8 


V 


I [25] 


59 


Q 


R 


; 9 


s 


F [!9l] 


60 


D 


N Csubt E) [33.194] 


10 


Q 




61 


N 


C W/C98 [160] 


It 


I 




62 


N 


D [5]; CMM [36] 


12 


K 




■ 63 


S 


D{25] 


■13 


A 




64 


H 


A [31] 


14 


P • 


L[191] 


65 


G 




15 


A 


K 


66 


T 




16 


L 




67 


H 


Y, A [3] 


; 17 


H 




68 


V 


C[7] 


> 18 


S 




69 


A 




V.I9 


Q 


E[45] 


70 


G 


A, S W/A75-83 [149] 


.^20 


G 




71 


T 


V[53] 


21 


Y 




72 


V 


1 [153] 


. 22 


T 


C W/C87 [110,183] 


73 


A 


L, H W/A75-83 [149] 


23 


G 




74 


A 




24 


S 


C w C87 [110,183] 


75 


L 


Deletion 75-83 [19] 


25 


N 




76 


N 


D [99,111,174.191] 


26 


V 


C w/235 fl08]; C \v/232 [95]; R [45] 


77 


N 


D [45] 


27 


K 


C W/C89 [108]; R [54,65] 


78 


s 


C w/Cl [108]; D [25] 


28 


V 




79 


I 


T 


29 


A 


C \v/Cn9 [95] 


80 


G 


C W/C41 [95] 


30 


V 




81 


V 




31 


I 


L [157] 


82 


L 




32 


D 


N, A [31]; N [51] 


83 


G 




33 


S 


D. E [5] 


84 


V 




34 


G 




85 


A 


C w/232 [108] 


35 


I 




86 


P 




36 


D 


Q [148]; C VV/C210 [95]; insertion of D 
(savinase) [174] 


87 


s 


C w/22 and 24 [110,183]: S (savinase) 
[54] 


37 


S 


[191] 


88 


A 




38 


S 




89 


s 


E [45]: E89S (savinase) [65] 


39 


H 






T 




40 


P 




91 


Y 

1 




41 


D 


C \v/C80 [95]; Q. A w/A75-83 [149] 




A 


T [153] 


■42 


L 




9"^ 


V 


I [190] 


43 


K 


N [134]; N. R, w/A75-83 [149] 


94 


K 




44 


V 




95 


V 




45 


A 


Replacement 45-63 with thcrmitase 


96 


L 








sequence [16] 


97 


G 


D97G (subt E) [33] 


46 


G 




98 


A 


K [45]: C W/C61 [160] 


47 


G 




99 


D 


S, K [147] 


48 


A 




100 


G 


A, V, L [164] 


49 


S 


D, R [75]; P [65] 


101 


S 


H. K, E [165] 


SO 


M 


F [35,111] 


102 


G 


F[9] 


51 


V 


K [45] 


103 


Q 


R [33.194]: A [54] 





206 



Table 1 {continued) 



P.N. Bryan f Biochimica et Biophysica Acta 1543 (2000) 203-222 



104 

105 
106 
107 

108 
109 
110 
111 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 



S 

W 

I 

I 

N 

G 

I 

E 

W 

A 

I 

A 

N 

N 

M 

D 

V 

I 

N 

M 

S 

L 

G 

G 

P 

S 

G 

S 

A 

A 

L 

K 

A 

A 

V 

D 

K 

A 

V 

A 

S 

G 

V 

V 

V 

V 

A 

A 

A 

G 

N 



A. R, D. F, S, W, Y [8]; W [167); A, 
F [122,123]; V [174]: D [6]; I [54] 



V [35]; G. A, V. L, F [144]; G, A, V 
[123] 

S [99,190] 



T, E [124] 

S [34,191,194] 

C W/C29 [95] 

H120D (savinase) [174] 

C w/C 147 [108] 
S [54] 
L. I [3] 
A, G [3] 

I [124]; A. F [144]; G, A. V [123] 
A. S, V [156] 

F [9]: S12SG (snvinase) [174] 
F [9] 

D [33.124.166]: H, K [165] 
F [9] 



A, V, F [144] 



C W/C122 [lOS] 
C w/243 [95] 



C. S [3] 

A. R. L. F. P. T [161] 

L [20]: A. L. H. Q. R [180] 



Table 1 (continued) 



156 

157 
158 

159 
160 
161 
162 
163 
164 
165 
166 



G 
T 

S 
G 
S 
S 

s 

T 
V 
G 



1 £.1 

167 


I 


168 


P 


169 




170 


vr 
K. 


171 


Y 


172 


P 


173 


S 


174 


V 


175 


1 


176 


A 


177 


V 


178 


G 


179 


A 


180 


V 


181 


D 


182 


S 


183 


S 


184 


N 


185 


Q 


186 


R 


187 


A 


188 


S 


189 


F 


190 


S 


191 


S 


192 


V 


193 


G 


194 


P 


195 


E 


196 


L 


197 


D 


198 


V 


199 


M 


200 


A 


201 


P 


202 


G 


203 


V 



Q, S [184]: S, K [147]; G [33); CMM 
[36] 

158-165 replacement with thermitase 
sequence [16] 



C [191]; deletion 161-164 [155] 



R [45]: SI64T (savinase) T [53] 
C W/C191 [108] 

A, S, C, T. P, V, L. I, F, Y, W [46]; 
D, E, Q. M. K, R [184]; S [124]; D (S); 
R [191]: CMM [36] 



A [111,181] 
Y, L, M [65] 

D, E [112] 



S [33]; N [134]; D [190] 
G [33] 



P [33.124.132] 



C W/C165 [IDS] 
T [191] 

S194P (subt E) [65.191]: A194P 
(savinase) [174] 
Q [74]: Q, E. D, F, M. K, R (savinase 
[65,174] 

N [65.166] 



K [17] 



■ fgble 


1 (rnniini 

1 ^ L V' I ( 1 rl 1 


-204 


s 




I 


206 


Q 




s 


208 


T 


209 


L 


210 


P 


211 


G 


212 


IN 


213 


K 


214 
* • 


Y 


215 


G 


216 


A 


217 


Y 


218 


N 


219 


G 


220 


T 


221 


S 


222 


M 


223 


A 


224 


• S 


225 


P 


226 


H 


227 


V 


228 


A 


229 


G 


230 


A 


231 


A 


232 


A 


233 


L 


234 


1 

» 


235 




236 


s 


237 


K 


238 


H 


239 


P 


240 


N 


241 


W 


242 


T 


243 


N 


244 


T 


245 


0 


246 


V 


247 


R 


248 


s 


249 


s 


250 


L 


251 


E 


252 


N 


253 


T 


254 


T 





P.N. Bryan / Biochimica el Biophysica Acta 1543 (2000) 203-232 



207 



fable 


1 {continued) 






S 


F(17] 


205 


r 
1 


V205I (savinase) [53] 


206 


Q 


C (1 1 IJ; C W/C3/A75-83 [149]; N, D, 




Y, E, K, I, F, L, W [17] 


207 


s 




208 


T 




209 


L 


FI17] 


210 


P 


C W/C36 [95] 


211 


G 


K. P. L. W [96] 


212 


N 


P, A, V, S [96] 


213 


K 


R [35]; T [147] 


2H 


Y 


K \vM75-83 [149] 


215 


G 




216 


A 


E [17] 


217 


Y . 


L [181]; K [1 1 1]; W [134]; CMM [36] 


218 


N 


S. T. A, C D, W [26]: S [99.1 1 !J90]; 






M [17]; T, A, H [3] 


219 


G 




220 


T 


A [15] 


221 


S 


C [1.101,119]; A [31]; seleno [10] 


222 


M 


All [47]; Me-S-C [55); A [134,194]; G, 






S. A. V, F [3] 


223 


A 


S[3] 


224 


S 


A, C [3] 


225 


P 


A [I): G [3] 


226 


H 




227 


V 




228 


A 




229 


G 




230 


A 




231 


A 




232 


A 


C W/C85 [108]; C w/C26 [95] 


233 


L 


234 


1 




235 


L 


R [45]: K235L (savinase) [174] 


236 


S 


237 


K 




238 


H 




239 


P 


G K R fl5Sl 


240 


N 




241 


VV 




242 


T 




243- 


N 


C W/C148 [95] 


244 


T 


245 


Q 




246 


V 




247 


R 




248 


S 


N. A. L [66] 


249 


s 


C W/C273 [108] 


250 


L 


251 


E 


E[651 


252 


N 




253 


T 


C W/C273 [108] 


254 


T 


A [124] 



Table 1 (continued) 



255 


T 


A [33] 


256 


K 


Y [134] 


257 


L 




258 


G 




259 


D 




260 


s 




261 


F 




262 


Y 




263 


Y 




264 


G 




265 


K 




266 


G 




267 


L 




268 


I 




269 


N 


D 


270 


V 




271 


Q 


E [2,45]; G [65] 


272 


A 




273 


A 


C W/C249 or C253 [108] 


274 


A 


A (savinase) [54] 


275 


Q 





A universal feature of subtilisins is the presence of 
one or more calcium binding sites. High resolution 
X-ray structures of subtilisin BPN', as well as several 
homologues [13,14,59,93], have revealed details of a 
conserved, calcium binding site, termed site A. Cal- 
cium at site A is coordinated by five carbonyl oxygen 
ligands and one aspartic acid. Four of the carbonyl 
oxygen ligands to the calcium are provided by a loop 
comprising amino acids 75-83. The geometry of the 
ligands is that of a pentagonal bipyramid whose axis 
runs through the carbonyls of 75 and 79. On one side 
of the loop is the bidentate carboxylate (D41), while 
on the other side is the N-terminus of the protein 
and the side chain of Q2. The seven coordination 
distances range from 2.3 to 2.6 A, the shortest being 
to the aspartyl carboxylate. Three hydrogen bonds 
link the N-terminal segment to loop residues 78-82 
in parallel-P arrangement. 

Because of the marginal stability of subtilisin 
without calcium bound, the energetics of calcium 
binding at site A are difficult to study indepen- 
dently of the unfolding reaction. By employing an 
inactive and stabilized version of subtilisin, the cal- 
cium-free (apo) form of subtilisin can be produced 
and calcium binding measured by microcalorim- 
etry and fluorescence spectroscopy [19]. The binding 
parameters obtained by titration calorimetry are 



208 



P.N. BryanlBiochimica et Biophysica Acta 1543 (2000) 203-222 



Ai/=-ll kcal/mol and i:a = 7XlO^ M"* at 25^C. 
The standard free energy of binding is 9.3 kcal/mol, 
so the binding of calcium is primarily enthalpically 
driven with only a small net loss in entropy 
(ASbinding = ~6-7 cal/®mol). This is surprising since 
transfer of calcium into water results in a loss of 
entropy of —60 cal/^mol. Therefore the freeing of 
water upon calcium binding to the protein will 
make a major contribution to the overall AS of the 
process. The gain in solvent entropy upon binding 
must be compensated for by a loss in entropy of 
the protein. Presumably, the loop amino acids 75- 
83 and the first few N-terminal residues have in- 
creased mobility when calcium is absent from the 
A site. 

o 

A second ion binding site (site B) is located 32 A 
from site A in a shallow crevice between two seg- 
ments of polypeptide chain near the surface of the 
molecule. The coordination geometry of this site 
closely resembles a distorted pentagonal bipyramid. 
Three of the formal ligands are derived from the 
protein and include the carbonyl oxygen atom of 
El 95 and the two side chain carboxylate oxygens 
of D197. Four water molecules complete the first 
coordination sphere. Evidence that site B binds cal- 
cium comes from determining the occupancy of the 
site in a series of X-ray structures from crystals 
grown in 50 mM NaCl with calcium concentrations 
ranging from 1 to 40 mM [112]. In the absence of 
excess calcium, this locus was found to bind a so- 
dium ion. The binding of these two ions appears to 
be mutually exclusive so that as the calcium concen- 
tration increases, the sodium ion is displaced, and a 
water molecule appears in its place directly coordi- 
nated to the bound calcium [112]. Analysis of occu- 
pancy vs. calcium concentration indicates that is 
approx. 40 M~'. 

2.1.2: Calcium-independent stability 

Subtilisin does not refold to the native state on an 
observable time scale except under conditions which 
make direct measurements of the equilibrium con- 
stant for folding impractical [64]. Site-directed muta- 
genesis afforded an opportunity to simplify the sub- 
tilisin folding reaction and test whether a calcium- 
free mutant subtilisin might fold more readily than 
the wild type protein. The calcium binding loop is 
formed from a nine amino acid bubble in the last 



turn of a 14-residue a-helix involving amino aci(ij 
63-85 [93]. Deleting amino acids 75-83 creates ^ 
uninterrupted helix and abolishes the calcium bin^J. 
ing potential at site A [2,19]. The X-ray structure \^ 
shown that except for the region of the deleted ca|, 
cium binding loop, the structure of the mutant 
wild type protein are remarkably similar considering 
the size of the deletion. The structures of subtilisjjj 
with and without the deletion superimpose with aa 
rms difference between 261 Ca positions of 0,17 A 
The N-terminus of the wild type protein lies beside 
the site A loop, furnishing one calcium coordination 
ligand, the side chain oxygen of Q2. In A75-83 sub- 
tihsin, the loop is gone, leaving residues 1-4 disor* 
dered, but the helix is uninterrupted and shows nor- 
mal helical geometry over its entire length. 

The folding rate of A75-83 BPN' is much 
faster than BPN'. Although it is hard to com- 
pare their folding rates under similar conditions 
[64,92], it is certain that A75-83 BPN' folds at least 
10^ times faster than BPN' in 0.1 M KPi, pH 7.0. 
The unfolding rates of the apo form of BPN' 
and A75-83 BPN' are very similar [19]. Since 

AGunroidine = — RT ln(A'unfoiding/A:foiding) in a two state 
system, the simplest interpretation of the unfolding 
and refolding rates would mean that AGunfoiding for 
A75-83 BPN' is at least 5.5 kcal/mol greater at 25T 
than for apo BPN'. Recent H-D exchange data in- 
dicate that the total AGunfoiding for A75-83 BPN' is 
approx. 7 kcal/mol in 0.1 M KPj, pH 7.0 and 
(unpublished data). This would mean that apo BPN' 
is near the margin of thermodynamic stability. 

2 J. 3. Calcium-dependent stability 

In view of the marginal stability of apo-subtilisin, 
it is evident that calcium binding makes a dominant 
contribution to conformational stability. By binding 
at a specific site in the tertiary structure, calcium 
contributes its binding energy to the stability of the 
native state and contributes to the overall free energ)" 
of folding. The unfolding reaction of subtilisin BPN 
can be divided as follows: 

N(Ca2) «'N(Ca) + Ca'i^'N + 2Ca 

where NCCa:) is the native form of subtilisin wiili 
calcium bound to both sites, N(Ca) is the native 
form of subtilisin with calcium bound to site A, ^ 




Svis the folde* 
^(ein. The tc 

^.one can cal- 
|;,free energy 

^Thus the ct 
5 subtilisin ir 
% 25*C. The t 
j^in 10 mM i 
X kcal/mol. Tl 
P which conci 
responsible 

'\l rate of subi 
V centrations < 
f examination 
i^i light of a b 
4^ subtilisin fol 
5 effect on su 
. erate concer 

..J 2,2. Kinetic 

In most I 
. stability is c 
a function » 
inactivation 
todigestion, 
tain amino . 
ity by this 
inactivation 
determined, 
vated with 
of the meth. 
If this occu; 
enzyme rem 
autodigestio 
tant mechai 
tions of enz} 
In general, ; 
of inactivati 
measuring i 
becomes thv 
activation a 
seen by dir*. 
subtilisin E 
with the rat 



-reates ^ 

ucture 
dieted caj, 
"^ant and 

- with an 
r 0.17 A. 
les beside 
"^t^inatioa 
5-83 sub. 

disor. 
ows nor. 

is much 
to coaj. 
'^nditiom 
i at least 
pH 7.0. 
►f BPN' 
]* Since 
vo state 
•ifoldin^ 

•Iding for 

at 25T 
-lata in- 
3PN' is 
:d 25T 
^ BPN' 



itilisia 
tninant 
>indiiie 
alciun: 
of the 
?nerg} 
BPN' 



\vil^• 
lati^f 



I 

Ifis the folded apoprotein and U is the unfolded pro- 
l^ein. The total free energy of unfolding is therefore 
|cqual to A^,+A^3+A^3. From the binding constant, 
bi one can calculate the contribution of calcium to the 
Upcc energy of subtilisin folding from the equation: 
fiGhin^ms = "RT ln(l -h K^Ca]) 

;;j:Thus the contribution of site A to the stability of 
•'subtihsm m 10 mM calcium is 6.6 kcal/mol at 
25'C. The contribution of calcium binding to site B 
l in 10 mM calcium and 50 mM sodium is only 0 2 
^ kcal/mol. This analysis is at odds with earlier studies 
.:;which concluded that calcium binding to site B is 
.' responsible for the large decrease in the inactivation 
Kiate of subtihsin in the presence of millimolar con- 
• centrations of calcium [16,112]. As shown below, re- 
! . examination of calcium-dependent stability data in 
light of a better understanding of the energetics of 
;v subtihsin folding shows that site B has relatively little 
* effect on subtilisin stability in the presence of mod- 
} erate concentrations of monovalent cations. 

Kinetics of irreversible inactivation 

^ In most protein engineering studies of subtilisin 
stabihty is defined in terms of the loss of activity as 
a function of time. The mechanisms of irreversible 
inactivation can be complex involving unfolding, au- 
todigestion, aggregation and chemical damage to cer- 
tain amino acids. If one wishes to understand stabil- 
ity by this definition, the rate determining step in 
inactivation under the specified condition must be 
determined. For example, subtilisin can be inacti- 
vated with hydrogen peroxide due to the oxidation 
of the methionine next to the active site serine [146J. 
If this occurs, it is irrelevant to activity whether the 
enzyme remains folded or not. It is also clear that 
autodigestion will become a relatively more impor- 
tant mechanism of inactivation at high concentra- 
lons of enzyme because it is a second order reaction, 
in general, however, studies which measure the rate 
of mactivation at elevated temperature are indirectly 
measuring the rate of unfolding because unfolding 
becomes the rate determining step in irreversible in- 
activation as temperature is increased. This can be 
seen by directly comparing the rate of unfolding of 
subulisin BPN' using calorimetric measurements 
With the rate of inactivation under the same condi- 



P.M BryanI Biochimica e, Biophysica Acta 1543 (2000) 203-222 



209 



c 




0.00286 0.00288 



0.0029 0.00292 0.00294 0.00296 

l/'K 

Fig. 1. Comparison of the rates of irreversible thermal inac.iva- 

°l ^'"^ of thermal unfolding in 50 

mM Tr.s-HCl, pH 8.0. 50 mM NaCl. 10 mM CaCN. over the 
temperature range of 65-75'C. Unfolding rates are' measured 
by differential scanning calorimeiry. Data arc plotted as the 
natural logarithm of the rate constants vs. 1/»K. Solid circles 
show the rate of unfolding and open circles show the rate of in- 

TTT 7"' ^"^^gy or both processes is appro.x. 

80 kcal/mol at 65''C. 



tions (Fig. 1). Hence changes in rate of irreversible 
inactivation at elevated temperatures resulting from 
mutation are reflecting a change in activation energy 
for unfolding. 

Stabilizing mutations in subtilisin characterized by 
changes in the kinetics of inactivation can be classi- 
fied into three groups: (1) stabilizing only in calcium, 
(2) stabilizing only in chelants, and (3) stabilizing in 
both conditions (Table 2). From this partitioning it is 
evident that the mechanism of thermal inactivation 
differs depending on whether the calcium sites are 
occupied. To understand why this is so, one must 
understand how the kinetics of inactivation are re- 
lated to the kinetics of unfolding and how the ki- 
netics of unfolding are related to the kinetics of cal- 
cium loss. 

2.2. L Inactivation in excess EOT A 

Thermal inactivation in EDTA is a two step pro- 
cess as shown in mechanism 1 : 

N(Ca)-f EDTA<oNH-Ca : EDTA=*U=>I (1) 

Fig. 2 compares the rate of calcium dissociation with 
the rate of unfolding as a function of temperature for 
an inactive variant of subtilisin BPN' [19]. Reparti- 
tioning of calcium from site A into a strong chelator 



210 



P.N. BryanI Biochimica et Biopliysica Acta 1543 (2000) 203-222 



Table 2 



Table 2 (continued) 



Stabilizing mutants in calcium 



BPN' 


10 mM CaCN 


10 mM 


EDTA 




V8I 


2.0 


0.8 




[25] 


S63D 


1.1 


0.6 




[25] 


GI31D 


1.5 


0.9 




[1241 


G169A 


5.9 


1.1 




[111] 


LI 261 


1.4 


1.1 




[124] 


A116E 


1.3 


1.0 




[124] 










f 1 1/11 

[124] 






1 0 




i*-^l 


S188P 


t.8 


1.0 




[124] 


P172D 


1.5 


1.1 




[112] 


T234A 


2.0 


1 .0 




[124] 


Ni09S 


+ 






(99) 


loop 45-63 


10.0 






[16] 


RPN' 










Q19E/Q271E 


2.0 






[45] 


N77K 


1.3 






[45] 


BPN' 


50 iiM calcium 

* 








K256Y 


6.6 






[134] 


Subtilisin E 


1 mM calcium 








9F 


1.4 






[191] 


PI4L 


1.5 






[191] 


N76D 


1.6 






{1911 


N118S 


+ 






[191] 


S161C 


3.0 






[191] 


G166R 


2.0 






[191] 


NI81D 


3.0 






~ noil 
{\J\\ 


S194P 


7.0 (P in BPN') 






f 1 oil 


N218S 


2.7 






r 1 Q 1 1 


subtilisin E 


1 mM calcium 








C6I-C98 


2.3 






[160] 


Stabilizing mutations in calcium 


or EDTA 






BPN' 


10 mM CaCl: 


10 mM 


EDTA 




N76D 


1.5 


2.4 




[111] 


S78D 


-f 


1.5 




P5] 


N218S 


3.5 


2.6 




[26] 


Y217K 


3.3 


2.7 




[111] 


Q206COX 


5.0 


5.0 




[111] 


Q271E 


1.3 


1.3 




[2] 


Stabilizing mutants in chclant 


BPN' 


10 mM CaCh 


10 mM 


EDTA 




C22-C87 


1.0 


1.5 




[110] 


M50F 


0.75 


1.4 




[111] 


C206-C2I6 


1.0 


1.5 




[108] 



BPN' 


0.1 M KPi, pH 12.0 




no7v 


1.2 


[35] 


K213R 


1.3 


[35) 


M50F 


1.5 


[35] 


Unclassified stabilizing mutants 






savinase AT'm in detergent 



K27R 

E89S 

R170Y 

S194P 

G195E 



+0.4 

+ 1.1 (S in BPN') 
+0.3 

+3.3 (P in BPN') 
+0.8 (E in BPN') 



[65] 
[65] 
[65] 
[65] 
[65] 



occurs at a rate 5 h~' at 45°C. The kinetic barrier to 
calcium removal is 23 kcal/mol. Calcium is a integral 
part of the subtilisin structure and its association or 
dissociation requires significant but transient disrup-. 
tion in surrounding protein-protein interactions. This 
disruption in structure would explain the high acti- 
vation energy and slow kinetics of calcium binding 
and dissociation. For example, breaking main chain 
hydrogen bonds between the N-lerminal region and 
the 75-83 loop region would allow the relatively 
buried calcium a passageway into or out of the pro- 
tein. Global unfolding in 10 mM EDTA at 45*»C is 
much slower than calcium dissociation, however, oc- 
curring at a rate of 0.04 h*"*, with an activation en- 
ergy of approx. 60 kcal/moL Thus the predominant 
mechanism of inactivation in EDTA is calcium dis- 
sociation followed by unfolding and loss of activity. 

Because calcium binding reaches equilibrium 
quickly relative to the rate of unfolding, mutations 
which stabilize in EDTA must stabilize apo-subtili- 
sin. Increasing the binding constant for one of the 
calcium sites would not help unless the increase in 
binding affinity were enormous. Consider a typical 
experiment in which 1 mM EDTA is added to 100 
^ig/ml subtilisin (3.6 .|iM) bound to a stoichiometric 
amount of calcium. The calcium will partition be- 
tween subtilisin and EDTA according to the equa- 
tion: 

[SCa]/[S,ou,i] = /Cs-Ca[S]/{l -f A's-ca[S] + K^-c.m 

where [SCa]/[Stotai] is the fraction of subtilisin bound 
to calcium, [S]'^ total subtilisin and [E]^ total 
EDTA. Since the binding constant of subtilisin for 



-X-i 
-1.5- 

-2- 
-2.5- 

-3- 
-3.5-: 

-4-: 
-4.5-' 

-si 

0.00 



Fi?. 2. Cor 
cess fluorcs. 
folding, for 
tion cnergic 
60 kcal 
mM NaCl, 
natural log:i 
;Aow the ni 
\ calcium dis.s 

calcium at 
ing consta 
M"', then 
to calciun 
which sta! 
sulfides C 
mutation ^ 
relative t( 
binding ai 
of excess t 
nism 2 bei 
M50F mui 
10 mM EJ 
ble 2). 

2-2,2. Incu 
The inav 
diagrammc 



N(2Ca) 
U 





P.N. Bryan f Biochimica et Biophysica Acta 1543 (2000) 203-232 



211 



-1 

-1.5 
-2 
-2.5 -t- 

-3.5-!- 
-4.5-- 



T r 



-I — I — I — r 




calcium dissociation 
in quin2 



\ 



unfolding in EDTA 



-5 



0.0029 0.003 



0.0031 0.0032 0.0033 0.0034 
1/"K 

pig. 2. Comparison of ihe rates of calcium dissociation in ex- 
cess fluorescent chelator (qiiin2) with the rate of thermal un- 
folding, for the inactive subtilisin mutant, Sll [19]. The activa- 
tion energies are 23 kcal/mol for calcium dissociation in quin2 
and 60 kcal/mo! for unfolding in 50 mM Tris-HCl. pH 8.0, 50 
oM NaCl, 10 mM EDTA. at 45^*0. Data arc plotted as the 
natural logarithm of the rate constants vs. \rK. Solid circles 
• jhow the rate of unfolding and closed circles show the rate of 
calcium dissociation. 

calcium at site A (As-ca) = 7x10^ M"' and the bind- 
ing constant of EDTA for calcium {A:E_ca) = 2x 10^ 
M"', then less than 0.02% subtilisin would be bound 
to calcium at equilibrium. Examples of mutations 
which stabilize apo-subtilisin are M50F and the di- 
sulfides C22-C87 and C206-C216. The irony is that a 
mutation which preferentially stabilizes apo-subtilisin 
relative to the bound form, will weaken calcium 
binding and catalyze inactivation under conditions 
of excess calcium and high temperature (see mecha- 
nism 2 below). This phenomenon is displayed in the 
M50F mutant, which is more stable than wild type in 
10 mM EDTA but less stable in 10 mM CaCN (Ta- 
ble 2). 

^.2.2. Inactivation in excess calcium 

The inactivation of subtilisin in excess calcium is 
(diagrammed in mechanism 2: 



Ka (site B) 
N(2Ca) <=> N(Ca) + Ca 



Ka (site A) 

<=> N +2Ca 

U 

^ >>k3 



In excess calcium (e.g. ^1 mM) and moderate tem- 
perature, calcium binding and dissociation is in rapid 
equilibrium because calcium binding is much faster 
than unfolding. The rate of inactivation is deter- 
mined by the fraction of each native species times 
its unfolding rate. Using mechanism 2, one can 
show that calcium dependent stabilization of subtili- 
sin is dominated by site A rather than site B. Fig. 3 
plots the rate of inactivation of BPN' at 65**C as a 
function of calcium concentration and fits the data to 
the following mechanism: 



N(Ca2) 
11 0.0035 s-i 
U 



33 M-» 

<=> N(Ca) +Ca 
^ 0.0085 s-l 
U 



2.5x105 M-^ 



<=> N +2Ca 
a 8.7 s-» 
U 

il >25s-> 
I 

The mechanism predicts that K,, values of site A and 
site B are 2.5 x 1 0^ M"' and 33 M"* at 65**C. The 
rate of inactivation of subtilisin with only site A 
occupied (NCa) is about 1000 times slower than 
apo-subtilisin (N) and the rate of inactivation with 
both sites occupied (NCa2) is about 2.5 times slower 
than with only site A occupied. The second predic- 
tion has been borne out by measuring the calcium 
dependent stability of a mutant which has site B but 
lacks site A [149]. The rate of inactivation of this 
mutant is only 2.4 times slower in 10 mM CaCN. 
50 mM NaCl than in 10 mM EDTA, 50 mM NaCl. 

Another prediction of mechanism 2 is that any 
mutations which stabilize only in the presence of 
calcium will increase the binding constant for calci- 
um to one or both of the calcium sites. This can be 
either through effects on the binding sites themselves, 
as proposed for mutations A116E, G131D, P172D, 
S63D, N76D, S78D and K256Y and the thermitase 
loop 45-63 in BPN', or through indirect effects on 
conformational stability as seen for mutations V8I, 
S53T, L126I, G166S, G169A and T254A (Table 2). 
The indirect effect on calcium binding arises because 
apo-subtilisin displays a loss of cooperativity in the 
unfolding reaction [19]. Thus many mutations which 
stabilize in the presence of calcium do not stabilize 
in the presence of EDTA, because they do not influ- 
ence the rate determining step in the unfolding of 





'•3 



212 



P.N, Bryan I Biochimica ei Biophysica Acta J 543 (2000) 203-222 



0.5 



C 

E 



0.4 -- 



I t 



' " ' I 



T' ■' I I" 



I I I r I ! f 




' 



0.0001 



0.001 0.01 
[Calcium] 



0.1 



Fig, 3. The rates of thermal inactivation of subtilisin BPN' at 
65**C are plotted as a function of calcium concentration. The 
data are fit to mechanism 3 in the text. Data taken from Fig. I 
of PantoHano et al. [1 12]. 



apo-subtilisin. In fact, most mutations identified by 
random mutagenesis stabilize only in the presence of 
calcium. These mutants increase calcium binding af- 
finity because they preferentially stabilize NCa rela- 
tive to N. The premise that the effects of this class of 
mutations indirectly increase calcium affinity by in- 
creasing general stability was tested by introducing 
G166S, G169A and T254A into the rehabilitated S88 
version of A75-83 subtilisin [126]. Because the un- 
folding of the S88 subtilisin is cooperative in 
EDTA, these mutations now stabilize subtilisin S88 
in 50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 10 mM 
EDTA to approximately the same extent that they 
stabilize subtilisin BPN' in 50 mM Tris-HCl, pH 8.0, 
50 mM NaCl, 10 mM CaCh. 

Finally mutations which stabilize in excess calcium 
and in EDTA to the same extent must stabilize N 
and NCa to equal extents. This would result in no 
change in calcium affinity. Mutations of this class are 
N218S, Y217K, Q206Cox and Q271E [2,111]. 

2.23. Disulfide mutants 

Because of the slow rate of the subtilisin folding 
reaction, most stability experiments are affected only 
by the activation energy for unfolding and not the 
equilibrium constant for unfolding. This immediately 
explains why engineering disulfide bonds into subti- 
lisin was so spectacularly unsuccessful in increasing 
resistance to thermal inactivation [95,108]. A well- 



designed disulfide cross-link should stabilize a pi^J- 
tein by decreasing the entropic cost of folding, 
loss of conformational entropy in a polymer due to a 
cross-link has been estimated by calculating the 
probability that the ends of a polymer will simulta. 
neously occur in the same volume element (vs) ac. 
cording to the equation: 

AS = -R \n{?>/{2Kl-Nf^')v, 

where is the number of segments and / is the 
length of a segment [118]. Good agreement with ex- 
perimental data for protein cross-linking has been 
achieved using 7=3,8 A and Vs = 58 A^, judged to 
be the closest approach of two -SH groups [106], 

Of 18 different disulfide cross-links which have 
been engineered into subtilisin, three have increased 
stability [108,110,160], Two of these stabilize only in 
the presence of EDTA. This is not surprising in ret- 
rospect because effects on the stability of the un- 
folded state would not generally be manifested in 
the activation energy of the unfolding reaction. 
This is because the transition state for the unfolding 
reaction appears to be compact, with a slightly larger 
heat capacity than the native state. Further analysis 
of one of the disulfide mutants (C22-C87) in the 
background of A75-83 BPN' showed that disulfide 
did in fact have the predicted effects on the unfolded 
state [150]. The increase in the energy of the unfolded 
state due to cross-linking 57 amino acids (22-87 mi- 
nus the nine amino acid deletion) would be 4.2 kcal/ 
mol at 25°C so the predicted maximum increase in 
folding rate at 25°C would be approx. 1000-fold 
Since the 22-87 disulfide accelerated folding by 
850-fold at 25°C in 0.1 M KPO4, pH 7.2, the accel- 
eration of the folding rate is qualitatively consistent 
with the simple statistical mechanical model and sug- 
gests that amino acids 22 and 87 are ordered in the 
transition state for folding. Accordingly, the small 
influence of the disulfide on the transition state for 
unfolding wild type BPN' in EDTA (Table 2) indi- 
cates residues 22 and 87 are only slightly less ordered 
in the transition state for unfolding in EDTA than m 
the folded state. Other mutations which preferen- 
tially decrease the entropy of the unfolded state rel- 
ative to the folded state, such as substituting for gl)'' 
cine or substituting with proline, also are nd 
necessarily expected to influence the rate of unfoW* 
ing. 



I 



Two 
suited ii 

; ing. 
subtilisii 

occurrin 
aquaticu 
by rand 
cross-lin 
link in s 
fold. Th 
inactivai 
I>;-termii 
p-hairpii 
between 
the tran: 
; The 3-2 
A75-83 : 
ordering 
state for 

2.2.4. Rc 
Rando 
an effect i 
even wit. 
of the su 
jor reaso 
fairly cor 
bust, on 
changes 1 
inactivati 
vidual St. 
latively. 
achieved 
tein struc 
tions. 

Rando: 
ous ways 
base anal 
oligonucl 
the abiUi 
increased 
carried o 
which all 
1000 mu I 
stable mi; 
elevated i 
vate the v 
lytic activ; 





P.N. Bryan /Biochimica et Biophysica Acta 1543 (2000) 203-222 



213 



ize a pro. 
'ding. 
T due to a 
lating the 
11 simulta. 
(vs) ac. 



i / is the 
t with ex- 
has been 
judged to 
s [106]. 
■lich have 
increased 
only in 
:ng in ret- 
r the un- 
iifested in 

reaction, 
unfolding 
itly larger 
r analysis 
7) in the 
. disulfide 

unfolded 
; unfolded 
22-87 mi- 
• 4.2 kcal/ 
icrease in 
.000-fold. 
>lding by 
the accel- 
consistent 
1 and sug- 
red in the 
the small 

state for 
e 2) indi- 
;s ordered 
A than in 

preferen- 

state rel- 

c for gly- 
are not 
>f unfoM- 



Two engineered disulfide bond mutants have re- 
sulted in significant decreases in the rate of unfold- 
ing. One is a disulfide between residues 61 and 98 in 
subtilisin E, which was modeled after a naturally 
occurring disulfide in aqualysin I from Thermus 
aquaticus [160]. The other is a disulfide identified 
by random mutagenesis of A75-83 subtilisin, which 
cross-links residues 3 and 206 [149]. The 61-98 cross- 
link in subtilisin E slows thermal inactivation by 2.3- 
fold. The 3-206 cross-link in A75-83 subtilisin slows 
inactivation by 17-fold. The 3-206 disulfide links the 
N-terminal strand of subtilisin with the 202-219 
P-hairpin. Evidently disruption of the interactions 
between these two structural elements is involved in 
the transition state for unfolding A75-83 subtilisin. 
The 3-206 cross-link increases the folding rate of 
A75-83 subtilisin by only 1.8-fold [126]. Evidently 
ordering of these residues occurs after the transition 
state for the folding reaction. 

2.2.4. Random mutagenesis 

Random mutagenesis and screening proved to be 
an effective method to dramatically increase stability 
even without much understanding of the energetics 
of the subtilisin folding reaction. There are two ma- 
jor reasons for this. First, stabilizing mutations are 
fairly common. Although subtilisins are naturally ro- 
bust, on the order of 1% of the random amino acid 
changes measurably increase the half-time of thermal 
inactivation [124]. Second, contributions from indi- 
vidual stabilizing mutations generally accrue cumu- 
latively. Thus large increases in stability can be 
achieved with no radical changes in the tertiary pro- 
tein structure but rather minor, independent altera- 
tions. 

Random mutations have been introduced in vari- 
ous ways, including chemical mutagens, mutagenic 
base analogs, error prone PCR and spiked synthetic 
oligonucleotides. The key element in the process is 
the ability to screen large numbers of mutants for 
increased stability. Phenotypic screening has been 
carried out using plate or microtiter dish assays 
which allows assaying proteases from approx. 100- 
1000 mutant clones per plate or dish. To screen for 
stable mutants, secreted subtilisins are incubated at 
elevated temperature long enough to largely inacti- 
vate the wild type enzyme. When an assay for hydro- 
lytic activity is subsequently performed, only mutants 



with stability greater than wild type will exhibit mea- 
surable activity. Once stable mutants are identified, 
the corresponding colony can be grown up to iden- 
tify the mutation. The labor factor in screening limits 
the number of mutants which can be examined to the 
10-*-10^ range. All single amino acid substitutions in 
subtilisin would yield a total of 5500 different varia- 
tions. Since all combinations of double substitutions 
would produce 3x10^ variations, only the popula- 
tion of single mutations in subtilisin has been ad- 
equately searched for stabilizing events. In fact, 
even the population of single substitutions has not 
been completely explored because the nature of the 
genetic code dictates that each amino acid can be 
changed to an average of six other amino acids by 
a single base substitution in the gene. Thus only 
about 30% of the possible single substitution mutants 
would be produced from single base substitutions. 

Early studies with chemical mutagens found eight 
stabilizing mutations in BPN' by screening at most 
1200 different single amino acid substitutions 
[26,27,124]. Misincorporation induced by a-thio- 
deoxynucleotides identified three additional stabiliz- 
ing mutations in BPN' [35] and studies using error- 
prone PCR to introduce mutations in subtilisin E 
identified 11 stabilizing mutations [191]. Five of the 
mutations in subtihsin E were previously identified as 
stabilizing in BPN'. The fact that several of the same 
mutations have been independently selected indicates 
that many of the stabilizing mutations which can be 
produced with single base substitutions have been 
identified. Since this represents only 30% of the total 
possible single amino acid substitutions, many other 
stabilizing single substitutions must exist. Two exam- 
ples are the directed mutations Y217K and Q206C 
which both stabilize significantly but are not acces- 
sible by a single point mutation [1 1 1]. Further Miya- 
zaki and Arnold have shown that targeting random 
mutagenesis to positions at which stabilizing changes 
were already found can identify even better amino 
acids at these positions [96]. 

Once stabilizing single amino acids changes have 
been identified, building a highly stable subtilisin 
can be accomplished in a step by step manner by 
combining individual mutations into the same mole- 
cule. A combination of six stabilizing changes in 
BPN' decreased the rate of thermal inactivation by 
> 300-fold [111]. A similar result was achieved in 





214 



P.N. Bryan I Biochimica et Biophysica Acta J 543 (2000) 203-222 



sublilisin E by performing multiple rounds of ran- 
dom mutagenesis screening and molecular breeding 
screening [191]. A hyperstable calcium-free subtilisin 
has also been constructed by a combination of design 
and random mutagenesis. This mutant inactivates 
250000 times more slowly than wild type BPN' in 
10 mM EDTA [126,149]. 



3, Future prospects 

3.1. Design vs. screening 

What strategies will prove most effective for engi- 
neering other properties of subtilisin? At the moment 
directed evolution seems to have become more fash- 
ionable than structure-based design as a method to 
'engineer' subtilisin. Part of this trend may be a re- 
sult of earlier disappointments with the ability to 
predict the phenotype of designed mutants, but 
most is a result of advances in random mutagenesis 
methods [76,135,190,192]. For example, synthesis of 
oligonucleotides using preformed trinucleotide phos- 
phoramidites will circumvent some of the limitations 
inherent to the genetic code [81]. Furthermore new 
methods of DNA shuffling allow efficient creation of 
chimeric proteases to try and combine desirable 
properties from parent enzymes [103,137,173]. Di- 
rected evolution and molecular breeding methods 
have proven useful for finding mutations which are 
better than wild type for several different properties 
[136]. There is always the danger, however, that the 
good will become the enemy of the best [125]. The 
new techniques do not circumvent the combinatorial 
problems inherent to purely random methods. Thus 
random approaches will be good for improving a 
global property such as stability which can be ac- 
crued incrementally but will not be successful when 
significant improvements depend on synergistic mu- 
tational events. Relying on the accumulation of sin- 
gle mutants insures that only solutions very close to 
the starting structure will be found. The best solu- 
tions may lie unmined a few layers deeper in muta- 
tional space. 

Optimizing subtilisin activity for a specific protein 
sequence or for a new substrate are cases in which 
synergistic mutations probably will be required. Con- 



sider the basic organization of the substrate bindino 
pockets of subtilisin. Although the deep Si and 
S4 binding clefts are the primary determinants 6f| 
substrate specificity, subtilisin is relatively non-spei^ 
cific in its cleavage preferences for protein s\x\^^^ 
strates. The broad specificity is in part a consequence'^' 
of the fact that the substrate peptide backbone inl^! 
serts itself between residues 100-104 and 125-129 to'^^ 
become the central strand in an antiparallel P-sheet.^^ 
This is different from the more specific chymotrypL f^'^ 
sin family of proteases in which a structural equiv^$^ 
alent of residues 100-104 is absent [113]. The best"|^ 
solutions to accommodate new substrates may in^V; 
volve altering main chain interactions and this will^ 
involve multiple synergistic mutations. When high>l 
resolution structural information becomes available^;: 
for the subtilisin class of prohormone converting eri-'f 
zymes, it will be interesting to see what structurali'; 
differences account for sequence-specific processinjgi* 
activity. ^^>^^ 
Introducing the bias of intelligent design into rm-'^j 
dom mutagenesis experiments has been criticized be-'^j 
cause of limitations in the intelligence of designers; s'; 
The dilemma is as follows. The more target positions* 
for mutagenesis are restricted, the greater the ability 
of screening to identify synergistic mutations. But the T 
greater restriction of the target positions, the greater?t; 
the danger of flawed design. In many cases, however,; 
only minimal design is required to identify produc-V 
tive regions of sequence space. Past experiences with 
directed mutagenesis have shown that mutations 
which have the greatest influence on substrate specif-' 
icity involve either direct contacts with the substrate s 
or electrostatic changes in the vicinity of the active 
site. This is also borne out by experiences with ran- 
dom mutagenesis and screening. For example, You, 
Chen and Arnold have randomly mutated subtilisin 
E using error-prone PCR and screened for increased 
activity in dimethyl form amide against a defined pep- • 
tide substrate [33,189]. Twelve mutations were iden-" 
tified in the screen. Of the twelve, two are involved in 
direct binding with the peptide, three are mutations 
of Asp or Glu to neutral amino acids at positions 
which would influence the pA'^ of H64, five are mu- 
tations which increase general stability and only two 
are at positions whose connections with activity m 
DMF are difficult to rationalize. 



P.N. Bryan i Biochimica et Biophysica Acta 1543 (2000) 203-223 



215 



' 3.2. Phage display selection 



Recent successes in displaying sublilisin on the sur- 
face of phagemid particles greatly expands the pos- 
sibilities for selecting new properties [3,37,84]. While 
less direct than culture dish or microtiter plate meth- 
ods for screening, phage display methods increase the 
number of mutants which can be screened by at least 
four orders of magnitude. The ability to display H- 
braries of 1 X 10^ independent mutants allows screen- 
ing all combinations of amino acids at six specified 
positions. The obvious limitation of phage displav is 
that selection is achieved by binding activity, so that 
selection of a catalytic event is not trivial. In one case 
random mutations at 25 positions were introduced 
into S221C subtilisin to select for improved peptide 
ligation. Ligase activity allowed product capture by 
the ligation of the subtilisin phagemids with im- 
proved ligase activity to a biotin-tagged peptide [3]. 
A second study successfully displayed fully active 
subtilisin on phage, although this involved addition 
of the subtilisin inhibitor CI2 to the culture medium. 
Selection for a change in P4 specificity then was car- 
ried out using a biotin-linked peptide diphenylester 
inhibitor [84]. 

33. Uncoupling prodomain processing from selection 

A major limitation to any screening/selection 
method is that mutations affecting catalytic activity 
potentially affect the biosynthesis of subtihsin which 
is linked to autoprocessing of the prodomain [51]. 
Hence the selection of mutants will be biased toward 
enzymes which efficiently autoprocess. If the desired 
phenotype is activity toward a particular amino acid 
sequence, then the autoprocessing mechanism ac- 
tually might be used to aid in selection by mutating 
the processing site on the prodomain to the target 
sequence [5,84]. This is apparently what occurred in 
the natural evolution of prohormone converting en- 
zymes since the C-terminal sequence of the prodo- 
main reflects the processing specificity [143]. If the 
desired phenotype is activity against a novel sub- 
strate, however, one needs to uncouple the biosyn- 
thesis of subtilisin from the selection for the new 
activity. This has been accomplished by using the 
A75-83 version of subtilisin, which is capable of fold- 
ing without the prodomain [2.3,19,37]. 



3.4. Full circle 



The first genetically engineered subtilisin appeared 
in the literature in 1985 and addressed the sensitivity 
of subtilisin to oxidation by peroxide [47]. It had 
been determined earlier that M222 is sensitive to ox- 
idation leading to inactivation of the enzyme [146]. 
While it was clear that substituting for M222 would 
prevent this mechanism of inactivation by peroxide, 
it was not clear what amino acid would best substi- 
tute for methionine in providing optimal substrate 
interactions and preserving activity. For this reason, 
all 19 substitutions were made and the catalytic and 
stability properties of each compared. Thus even the 
first example of genetic-based protein engineering in 
subtilisin was in fact a random mutagenesis experi- 
ment which could be targeted to just one position 
because of detailed biochemical and structural infor- 
mation. After 15 years the best approach to 'engi- 
neering' desired properties into subtilisin probably 
remains targeted random mutagenesis, in which tar- 
get selection is informed by all available information. 

Acknowledgements 

The author wishes to thank Patrick Alexander. 
Biao Ruan and Susan Sirausberg for critically read- 
ing the manuscript. This study was supported by 
NIH grant GM42560. 



References 

[1] L. Abrahmsen. J. Tom. J. Burnier. K.A. Butcher. A. Kosiak- 
off. J. A. Wells. Engineering subtilisin and its .substrates for 
efficient ligation of peptide bonds in aqueous solution. Bio- 
chemistry 30 CI991) 4151-4159. 

[2J O. Almog, T. Gallagher. M. Tordova, J. Hoskins, P. Br>'an. 
G.L. Gilliland, Crystal structure of calcium-independent 
subtilisin BPN' with restored thermal stability folded with- 
out the prodomain. Proteins 31 (1998) 21-32. 

[3] S. Atwell, J. A. Wells. Selection for improved subtiligases by 
phage display, Proc. Natl. Acad. Sci. USA 96 (1999) 9497- 
9502. 

[4] K.H. Bae. J.S. Jang, K.S. Park. S.H. Lee, S.M. Byun. Im- 
provement of thermal stability of subtilisin J by changing the 
primary autolysis site, Biochem. Biophys. Res. Commun 
207 (1995) 20-24. 

[5] M.D. Ballinger. J. Tom. J.A. Wells, Designing subtilisin 





216 



P,N. Bryan I Biochimica et Biophysica Acta 1543 (2000) 203-222 



BPN' to cleave substrates containing dibasic residues. Bio- 
chemistry 34 (1995) 133I2-I33I9. 

[6] M.D. Ballinger. J. Tom, J. A. Wells, Furilisin: a variant of 
sublilisin BPN' engineered for cleaving tribasic substrates. 
Biochemistry 35 (1996) 13579-13585. 

[7] L.M. Boch, S. Branner, S. Hastnip, K, Breddam, Introduc- 
tion Ol a free cysteinyl residue at position 68 in the subtilisin 
s;uinaso. based on homology with proteinase K, FEBS Lett. 
297 (1992) 164-166. 

[S] L.M. Bcch, S.B. Sorensen, K. Breddam, Mutational replace- 
ments in subtilisin 309. Vail 04 has a modulating effect on 
the P4 substrate preference. Eur. J. Biochem. 209 (1992) 
869-874. 

[9] L.M. Bcch. S.B. Sorensen. K. Breddam, Significance of hy- 
drophobic S4-P4 interactions in subtilisin 309 from Bacithts 
IcnitLw Biochemistry 32 (1993) 2845-2852. 

[ID] I.M. Bell. M.L. Fisher, Z.P. Wu, D. Hilvcrt, Kinetic studies 
on the peroxidase activity of selenosubtilisin. Biochemistry 
32 (1993) 3754-3762. 

[Ill I.M. Bell. D. Hilvert, Peroxide dependence of the semisyn- 
thetic enzyme selenosubtilisin. Biochemistry 32 (1993) 
|»69-13973. 

[12] A. Berger, I. Schechter, Mapping the active site of papain 
with ilie aid of peptide substrates and inhibitors. Philos. 
Tr:in.<. R. Soc. London Ser. B Biol. Sci, 257 (1970) 249-264. 

[13] C Bci/el. S. ICupsch, G. Papendorf, S. Hastrup, S. Branner, 
K.S. \N ilson. Crystal structure of the alkaline proteinase sa- 
vin;isc tVom Bacillus lennts at 1.4A resolution, J. MoL Biol. 
(1992) 427-445. 

[14] \V. Bode. K. Papamokos, D. Musil, The high-resolution x- 
r:\y crystal structure of the complex formed between subtili- 
.<;in Curlsberg and eglin C, an elastase inhibitor from the 
leech llinuio medicinalis, Eur. J. Biochem. 166 (1987) 673- 

[15] S. Br;i\ion. J. A. Wells, The importance of a distal hydrogen 
bondiniT croup in stabilizing the transition state in subtilisin 
BPN'. J. Biol. Chem. 266 (1991) 11797-11800. 

lU^] S.B. Braxton, J. A. Wells, Incorporation of a stabilizing Ca- 
binJing loop into subtilisin BPN', Biochemistry 31 (1992) 
7:96-:S0I. 

fl"] P.F. .Brodo 3rd, C.R. Erwin. D.S. Rauch. B.L. Barnett. J.M. 
Annpriester, E.S. Wang, D.N. Rubingh, Subtilisin BPN' 
varianu';: increased hydrolytic activity on surface-bound sub- 
su:nes via decreased surface activity, Biochemistr>* 35 (1996) 
.'16:-.^i69. 

P.F. Hrodc 3rd, C.R. Erwin. D.S. Rauch, D.S. Lucas. D.N. 
Rubinch. Enzyme behavior at surfaces. Site-specific variants 
of sublilisin BPN' with enhanced surface stability. J. Biol. 
Chcni. 269 (1994) 23538-23543. 

P. Br>an. P. Alexander, S. Strausberg. F. Schwarz, L. Wang, 
G. Gilliland. D.T. Gallagher. Energetics of folding subtilisin 
BPN'. Biochemistry 31 (1992) 4937-4945. 
P. Bry:m. M.W. Pantoliano. S.G. Quill, H.Y. Hsiao. T. Pou- 
lts. Site-directed mutagenesis and the role of the oxyanion 
in .subtilisin. Proc. Natl. Acad. Sci. USA 83 (1986) 



[21] P. Bryan, L. Wang, J. Hoskins. S. Ruvinov. S. Strausberg, 
Alexander, O. Almog, G. Gilliland, T.D. Gallagher. Catajil 
ysis of a protein folding reaction: mechanistic implication^^? 
of the 2.0A structure of the subtiHsin-prodomain complcr^l^ 
Biochemistry 34 (1995) 10310-10318. ^-^^ 

[22] P.N. Bryan, in: T.J. Ahem, M.C. Manning (Eds.), Pharrnal^^ 
ceutical Biotechnology, part B, Plenum Press, New Yort v<; 
1992, pp. 147-181. 

[23] P.N. Bryan, in: B.A. Shirley (Ed.), Protein Stability an^ v 
Folding: Theory and Practice, vol. 40, Humana Press, To- 
towa, NJ, 1995, pp, 271-289. * ]: 

[24] P.N. Bryan, in: U. Shinde, M. Inouye (Eds.), Intramolecular 
Chaperones and Protein Folding, R.G. Landes, Austin, TX ' 
1995, pp, 85-112. 

[25) P.N. Bryan, M.P. Pantoliano, Combining Mutations for the - 
Stabilization of Subtilisin, United States: Genex Corp 'X 
1988. 

[26] P.N. Bryan, M.L. Rollence, M.W. Pantoliano, J. Wood/.; 
B.C. Finzel, G.L. Gilliland, A.J. Howard, T.L. Poulos, Pro^ 
teases of enhanced stability: characterization of a thermo- ' 
stable variant of subtilisin. Proteins Struct. Funct, Genet. I . 
(1986) 326-334, 

[27] P.N. Bryan, M.L. Rollence, J. Wood, S. Quill. S. Dodd, MV* 
Whitlow, K. Hardman, M.W. Pantoliano, in: J. Gavora, 
D.F. Gerson. J. Luong, A. Storer, J.H. Woodley (Eds.), 
Biotechnology Research and Applications, Elsevier Applied 
Science Publ.. Essex, 1988, pp. 57-67. 

[28] P. Carter, L. Abrahmsen, J. A. Wells, Probing the mecha- 
nism and improving the rate of substrate-assisted catalysis 
in subtilisin BPN'. Biochemistry 30 (1991) 6141-6148. 

[29] P. Carter, B. Nilsson. J.P. Burnier, D. Burdick, J.A. Wells, 
Engineering subtilisin BPN' for site-specific proteolysis. Pro- 
teins Struct. Funct. Genet. 6 (1989) 240-248. 

[30] P. Carter. J.A. Wells, Engineering enzyme specificity by *sub* 
strate-assisted catalysis'. Science 237 (1987) 394^399. 

[31] P. Carter, J.A. Wells, Dissecting the catalytic triad of a ser- 
ine protease. Nature 332 (1988) 564-568. 

[32] P. Carter, J.A. Wells, Functional interaction among catalytic 
residues in subtilisin BPN', Proteins Struct. Funct. Genet. 7 
(1990) 335-342. 

[33] K. Chen. F.H. Arnold. Tuning the activity of an enzyme for 
unusual environments: sequential random mutagenesis of 
subtilisin E for catalysis in dimethylformamide, Proc. Natl 
Acad, Sci. USA 90 (1993) 5618-5622. 

[34] N.M. Chu. Y. Chao. R.C. Bi. The 2 A crystal structure of 
. subtilisin E with PMSF inhibitor. Protein Eng. 8 (1995) 211- 
215. 

[35] B.C. Cunningham, J.A. Wells, Improvement in the alkaline 
stability of subtilisin using an efficient random mutagenesis 
and screening procedure. Protein Eng. 1 (1987) 319-325. 

[36] B.G. Davis. X. Shang. G. DeSantis, R.R. Botl, J.B. Jones, 
The controlled introduction of multiple negative charge at 
single amino acid sites in subtilisin Bacillus lentus^ Bioorg. 
Med. Chem. 7 (1999) 2293-2301. 

[37] S. Demartis. A. Huber, F. Viii, L. Lozzi. L. Giovannoni, P- 
Neri. G. Winter, D. Neri, A sirateev for the isolation of 




3^ 



catalyti 
at posit 
Chem. 
G. De5 
specifici 
modific; 
specifici 
||j4Ij D. Din; 
M. Mc'^ 
St rate s\ 
nol-subi 
J, Eder. 
BPN': . 
istry 32 
J. Eder. 
BPN': 1 
293-304 

f (44] M.R- El 
estein. J 
i ' charges 
cxchang 
J^[45] C.R- Er 
ofengin. 
Protein 
y^^] D.A. Es 
Burnier. 
bic efTcc. 
neering. 
[47] D.A. E> 
z> me b \ 
o.xidatio! 
t4&] CO. Fai 
Biochim. 
[49] T.D. Ga 
lisin by c: 
213. 

T.D. Ga 
zel (Eds. 
tants Et.- 
Press. N. 
151] T.D. Ga' 
segment- 
cific fold 
|52] N. Geno 

Stability 
Int. J. P. 
1531 D.W. Ci 
Mielenz, 



(50] 



P,N. Bryan I Biochimica et Biophysica Acta 1543 (2000) 203-222 



217 



ps) 



sub- 



p9) 



HO] 



(41] 



f42J 



[43] 



catalytic activities from repertoires of enzymes displayed on 
phage, J. Mol. Biol. 286 {1999) 617-633. 
G. DeSantis, P. Berglund. M.R. Stabile. M. Gold, J.B. 
Jones, Site-directed mutagenesis combined with chemical 
modification as a strategy for altering the specificity of the 
SI and Sr pockets of subtilisin Bacillus lenrus. Biochemistry 
37 (1998) 5968-5973. 

G. DeSantis, J.B. Jones, Probing the altered specificity and 
catalytic properties of mutant subtilisin chemically modified 
at position S156C and S166C in the SI pocket, Bioorg. Med. 
Chem. 7 (1999) 1381-1387. 

G. DeSanlis, X. Shang. J.B. Jones, Toward tailoring the 
specificity of the Si pocket of subtilisin B. lentus: chemical 
modification of .mutant enzymes as a strategy for removing 
specificity limitations. Biochemistry 38 (1999) 13391-13397. 
D. Dinakarpandian, B.C. Shenoy, D. Hilvert, D.E. McRcc. 
M. McTigue, P.R. Carey, Hlectric fields in active sites: sub- 
strate switching from null to strong fields in thiol- and sele- 
nol-subtilisins. Biochemistry 38 (1999) 6659-6667. 
J. Eder, M. Rheinnecker. A.R. Fersht. Folding of subtilisin 
BPN': characterization of a folding intermediate. Biochem- 
istry 32 (1993) 18-26. 

J. Eder, M. Rheinnecker, A.R. Fersht, Folding of subtilisin 
BPN': role of the pro-sequence. J. Mol. Biol. 233 (1993) 
293-304. 

[44] M.R. Egmond, W.P. Antheunisse, C.J. van Bemmel, P. Ray- 
estein, J. de Vlieg. H. Peters, S. Branner, Engineering surface 
charges in a subtilisin; the efl'ects on electrophoretic and ion- 
exchange behaviour, Protein Eng. 7 (1994) 793-800. 

|451 C.R. Erwin, B.L. Barnett, J.D. Oliver. J.F. Sullivan. Effects 
of engineered salt bridges on the stability of subtilisin BPN'. 
Protein Eng. 4 (1990) 87-97. 

(46] D A. EstcM. T.P. Graycar, J.V. Miller. D.B. Powers. J, P. 
Burnier. P.G. Ng, J. A. Wells. Probing sieric and hydropho- 
bic effects on enzyme-substrate interactions by protein engi- 
neering. Science 233 (1 986) 659-663. 

[47] D.A. Estell, T.P. Graycar. J. A. Wells, Engineering an en- 
zyme by site-directed mutagenesis to be resistant to chemical 
oxidation. J. Biol. Chem. 260 (1985) 6518-6521. 

148) CO. Fagain. Understanding and increasing protein stability, 
Biochim. Biophys. Acta 1252 (1995) 1-14. 

[•*9] T.D. Gallagher, P. Bryan, G. Gilliland. Calcium-free subti- 
lisin by design. Proteins Struct. Funct. Genet. 16 (1993) 205- 
213. 

[50] T.D. Gallagher. G. GiUiland. P. Bryan, in: R. Boti. C. Bet- 
zel (Eds.). Crystal Structure Analysis of Subtilisin BPN' Mu- 
tants Engineered for Studying Thermal Stability. Plenum 
Press. New York. 1996. 

(51] T.D. Gallagher. G. Gilliland. L. Wang, P. Bryan, The pro- 
segmcnt-subtilisin BPN* complex: crystal structure of a spe- 
cific foldase. Structure 3 (1995) 907-914. 

[52] N. Genov. B, Filippi. P. Dolashka, K.S. Wilson. C. Bctzel. 
Stability of subtilisins and related proteinases (subtilases). 
Int. J. Pepi. Protein Res. 45 (1995) 391-400. 

153) D.W. Goddette. T. Christiansen, B.F. Ladin, M. Lau. J.R. 
Mielenz, C. Paech. R.B. Reynolds. S.S. Yang. C.R. Wilson. 



Strategy and implementation of a system for protein engi- 
neering, J. Biotechnol. 28 (1993) 41-54. 

[54] T. Graycar, M. Knapp. G. Ganshaw. J. Daubcrman. R. 
Bott, Engineered Bacillus lent us subtilisins having altered 
flexibility, J. Mol. Biol. 292 (1999) 97-109. 

[55] H. Gron. L.M. Bech, S. Branner. K. Breddam, A highly 
active and oxidation-resistant subtilisin-like enzyme pro- 
duced by a combination of site-directed mutagenesis and 
chemical modification, Eur. J. Biochem, 194 (1990) 897-901. 

[56] H. Gron, L.M. Bech, S.B. Sorensen. M. Meldal, K. Bred- 
dam, Studies of binding sites in the subtilisin from Bacillus 
lentits by means of site directed mutagenesis and kinetic in- 
vestigations, Adv. Exp. Med. Biol. 379 (1996) 105-112. 

[57] H. G ron, K. Breddam, Interdependency of the binding sub- 
sites in subtilisin. Biochemistry 31 (1992) 8967-8971. 

[58] H. Gron, M. Meldal, K. Breddam, Extensive comparison of 
the substrate preferences of two subtilisins as determined 
with peptide substrates which are based on the principle of 
intramolecular quenching. Biochemistry 31 (1992) 6011- 
6018. 

[59] P. Gros. K.H. Kalk, W.G.J. Hoi, Calcium binding to ther- 
mitase. J. Biol. Chem. 266 (1991) 2953-2961. 

[60] D. Haring, B. Hubert. E. Schuler, P. Schreier, Reasoning 
enantioselectivity and kinetics of seleno-subtilisin from the 
subtilisin template. Arch. Biochem. Biophys. 354 (1998) 263- 
269. 

[61] D. Haring, P. Schreier, From detergent additive to semisyn- 
thetic peroxidase-simplified and up-scaled synthesis of sele- 
no-subtilisin, Biotechnol. Bioeng. 59 (1998) 786-791. 

[62] D. Haring, P. Schreier. Chemical engineering of enzymes: 
altered catalytic activity, predictable selectivity and excep- 
tional stability of the semisynthetic peroxidase seleno-subti- 
lisin. Nalurwissenschaften 86 (1999) 307-312. 

[63] D. Haring. E, Schuler, A. Waldemar, C.R. Saha-Mollcr, P. 
Schreier, Semisynthetic enzymes in asymmetric synthesis: 
enantioselective reduction of racemic hydroperoxides cata- 
lyzed by seleno-subtilisin, J. Org. Chem. 64 (1999) 832-835. 

[64] T. Hayashi. M. Matsubara, D. Nohara. S. Kojima, K. 
Miura, T. Sakai, Renaturation of the mature subtilisin 
BPN' immobilized on agarose beads, FEBS Lett. 350 
(1994) 109-112. 

[65] J. Heringa. P. Argos. M.R. Egmond, J. de Vlieg, Increasing 
thermal stability of subtilisin from mutations suggested by 
strongly interacting side-chain clusters. Protein Eng. 8 (1995) 
21-30. 

[66] H. Hirohara. M. Philipp. M.L. Bender, Binding rates. O-S 
substitution effects, and the pH dependence of chymotrypsin 
reactions. Biochemi.stry 16 (1977) 1573-1580. 

[67] 2. Hu, K. Haghjoo. F. Jordan. Further evidence for the 
structure of the subtilisin propeptide and for its interactions 
with mature subtilisin, J. Biol. Chem. 271 (1996) 3375-3384. 

[68] Z. Hu, X. Zhu, F. Jordan, M. Inouye. A covalently trapped 
folding intermediate of subtilisin E: spyontaneous dimeriza- 
tion of a prosubtilisin E Ser49Cys mutant in vivo and its 
autoprocessing in vitro. Biochemistry 33 (1994) 562-569. 

[69] W. Huang. J. Wang. D. Bhattacharyya, L.G. Bachas. 



218 





F.N. Bryan! Biochinuca et Biophysica Acta J 543 (2000) 203-222 



Improving the aciivity of immobilized sublilisin by site-spe- 
cific attachment to surfaces. Anal. Chem. 69 (1997) 4601- 
4607. 

[70] A, Ikai. Denaturation of subtilisin BPN' and its derivatives 
in aqueous guanidine hydrochloride solutions, Biochim. Bio- 
phys. Acta 445 (1976) 182-193. 

[71] H. Ikemura, H. Takagi, M. Inouye, Requirement of pro se- 
quence for the production of active subtilisin in Escherichia 
coli, J. BioL Chem. 262 (1987) 7859-7864. 

[72] M. Jacobs, M. Eliason, M. Uhlen, J. Flock, Cloning, se- 
quencing and expression of subtilisin Carlsberg from Bacii/us 
lichcnifornm. Nucleic Acids Res. 13 (1985) 8913-8926. 

[73] S.C. Jain. U. Shinde. Y. Li, M. Inouye, H.M. Berman. The 
crystal structure of an autoprocessed Ser221Cys-subtilisin E- 
propeplide complex at 2.0 A resolution, J. Mol. Biol. 284 
(1998) 137-144. 

[74] J.S. Jang. K.H. Bae, S.M. Byun. Effect of the weak Ca(2+)- 
binding site of subtilisin J by site-directed mutagenesis on 
heat stability, Biochem. Biophys. Res. Commun. 188 

(1992) 184-189. 

[75] J.S. Jang, D.K. Park, M. Chun. S.M. Byun, Identification of 
autoproteolytic cleavage site in the Asp-49 mutant subtilisin 
J by site-directed mutagenesis, Biochim. Biophys. Acta 1162 

(1993) 233-235. 

[76] L.J. Jensen, K.V. Andersen, A. Svendsen, T. Kretzschmar, 
Scoring functions for computational algorithms applicable to 
the design of spiked oligonucleotides. Nucleic Acids Res. 26 
(1998) 697-702. 

[77] H. Kano, S. Taguchi, H. Momose. Cold adaptation of a 
mcsophilic serine protease, subtilisin, by in vitro random 
mutagenesis, Appl. Microbiol. Bioiechnol. 47 (1997) 46-51. 

[78] T.W. Keough. Y. Sun. B.L. Bamett, M.P. Lacey. M.D. 
Bauer. E.S. Wang, C.R. Erwin. Rapid analysis of single-cys- 
teino variants of recombinant proteins. Methods Mol. Biol. 
61 (1996) 171-183. 

[79] R.D. Kidd. P. Sears, D.H. Huang. K. Witte, C.H. Wong, 
G.K. Farber, Breaking the low barrier hydrogen bond in a 
serine protease. Protein Sci. 8 (1999) 410^17. 

[80] R.D. Kidd. H.P. Yennawar. P. Sears, C.-H. Wong. G.K. 
Farbcr. A weak calcium binding site in subtilisin BPN' has 
a dramatic effect on protein stability, J. Am. Chem, Soc. 1 18 
(1996) 1645-1650. 

[SI] A. Knappik. L. Ge. A. Honegger, P. Pack, M. Fischer, G. 
Wellnhofer, A. Hoess, J. Wolle, A. Pluckthun, B. Virnekas. 
Fully synthetic human combinatorial antibody libraries (Hu- 
CAL) based on modular consensus frameworks and CDRs 
randomized with trinucleotides, J. Mol. Biol. 296 (2000) 57- 
86. 

[82] T. K-obayashi, M. Inouye. Functional analysis of the intra- 
molecular chaperone. Mutational hot spots in the subtilisin 
pro-pepiide and a second site suppressor mutation within the 
subtilisin molecule, J. Mol. Biol. 226 (1992) 931-933. 

[S3] W. Kullman. Enzymatic Peptide Synthesis, CRC Press, Boca 
Raton. FL. 1987. 

[S4] D. Leeendre, N. Laraki, T. Graslund, M.E. Bjomvad. M. 
Bouchei. P.A. Nygrcn, T.V. Borchcn. J. Fasirez. Display of 



active subtilisin 309 on phage: analysis of parameters inflj^^ 
encing the selection of subtilisin variants with changed j^j^ 
St rate specificity from libraries using phosphonylating iahiK. 
itors, J. Mol. Biol. 296 (2000) 87-102. • 

[85] J. P. Leis, C.E. Cameron, Engineering proteases with altered 
specificity, Curr. Opin. Biotechnol. 5 (1994) 403-408. 

[86] Y- Li, Z. Hu, F. Jordan, M. Inouye, Functional analysis of 
the propeptide of subtilisin E as an intramolecular chaperoi^ 
for protein folding. Refolding and inhibitory abilities of p^^^ 
peptide mutants, J. Biol. Chem. 270 (1995) 25127-25132 

[87] Y. Li, M. Inouye, Autoprocessing of prothiolsubtilisin E in 
which active-site serine 221 is altered to cysteine, J. b^qj^ 
Chem. 269 (1994) 4169^174. 

[88] Y. Li, M. Inouye, The mechanism of autoprocessing of the 
propeptide of prosubtilisin E: intramolecular or interraolec- 
ular event?, J. Mol. Biol. 262 (1996) 591-594. 

[89] W. Lu, I. Apostoi, M.A. Qasim. N. Wiirnc. R. Wynn, W.L 
Zhang, S. Anderson, Y.W, Chiang, E. Ogin. I. Rothbcrg, K 
Ryan, M. Laskowski Jr., Binding of amino acid side-chains 
to SI cavities of serine proteinases, J. Mol. Biol. 266 (1997) 
441-461. 

[90] K. Masuda-Momma, T. Hatanaka, K. Inouye, K. Kanaori, 

A. Tamura, K. Akasaka. S. Kojima, I. Kumagai, K. Miura, 

B. Tonomura, Interaction of subtilisin BPN' and recombi- 
nant Streptomyces sublilisin inhibitors with substituted PI 
site residues, J. Biochem. 114 (1993) 553-559. 

[91] K. Masuda-Momma, T. Shimakawa. K. Inouye, K. Hiromi, 
S. Kojima, I. Kumagai, K. Miura. B. Tonomura, Idcntifica' 
tion of amino acid residues responsible for the changes of 
absorption and fluorescence spectra on the binding of sub* 
tilisin BPN' and Streptomyces sublilisin inhibitor, J. Bio- 
chem. 1 14 (1993) 906-91 1. 

[92] M. Maisubara, E. Kurimoto. S. Kojima, K. Miura, T. Sa* 
kai. Achievement of renaturaiion of subtilisin BPN' by t 
novel procedure using organic salts and a digestible mutant 
of Streptomyces subtilisin inhibitor. FEBS Lett. 342 (1994) 
193-196. 

[93] C.A. McPhalen, M.N.G. James. Structural comparison of 
two serine protcinase-protein inhibitor complexes: eglin-C- 
sublilisin Carlsberg and CI-2-5ubtilisin novo. Biochemistr) 
27 (1988) 6582-6598. 

[94] H.C. Mei, Y.C. Liaw, Y.C. Li. D.C. Wang. H. Takagi, YC 
Tsai, Engineering subtilisin YaB: restriction of substrate spc* 
cificity by the substitution of Glyl24 and Glyl51 with Ala. 
Protein Eng. 11 (1998) 109-117. 

[95] C. Mitchinson. J. A. Wells. Protein engineering of disulfide 
bonds in subtilisin BPN'. Biochemistr>' 28 (1989) 4807- 
4815. 

[96] K. Miyazaki. F.H. Arnold. E.xploring nonnaiural evolution- 
ary pathways by saturation mutagenesis: rapid improvemeo' 
of protein function, J. Mol. Evol. 49 (1999) 716-720. 

[97] N. Mizushima, D. Spellmeyer. S, Hirono. D. Pea r I man. P- 
KoUman. Free energy perturbation calculations on bindml 
and catalysis after mutating threonine 220 in subtilisi**' 
J. Biol. Chem. 266 (1991) 11801-11809. 

[98] T. Nakatsuka. T, Sasaki, E.T. Kaiser. Peptide segment co« 



plini 
J. A 
p9] L.O. 
S. Fi 
ski. 
tions 
E. N 
phili< 
■;■ muta 
to CO 
\ (101] K.E. 
: the a 
tion'. 
. (102] K.E. 
' ol-sul 
inc r*. 
243 ( 
: (103) J.E. 

;■; of su! 
i." (1999 
W(104] T.P, ( 
4 chin, 
^ Asn-I 
■ V dral a 
: (I05J D. O 
Struci 
9B (1' 
(106] C.N. 
Confc 
with : 
Chem. 
(107] C. Pa. 
Unusi. 
engine 
379 fl 
(108] M P. : 

lized I 
|I09] M.w. 

ronmc 
Struct. 

[110] M.W. 
ence, J 
tilisin 1 
cystein 
2077-: 

I'll] M.W. 
K.D. 
creases 
cremen 
istry 2; 

n»2] M.W. 
B.C. r 
engine, 
for tht 
Bioche: 




lit, 




RN. Bryan t Biochimica et Biophysica Acta I54S (2000) 20S~222 



219 



(99) 



JlOO) 



(101) 



|I02] 



(103] 



(105] 



(106] 



[107] 



(108] 
1109] 

IlIO] 



[III] 



(112] 



pling caialyzed by the semisynthetic enzyme thiolsubtilisin, 
J. Am. Chem. Soc. 109 (1987) 3808-3810. 
L.O. Narhi. Y. Stabinsky. M. Levitt, L. Miller, R. Sachdev, 
S. Finley, S. Park, C. Kolvenbach, T. Arakawa, M. Zukow- 
ski. Enhanced stability of subtilisin by three point muta- 
tions. Biotechnoi. Appl. Biochem. 13 (1991) 12-24. 
E. Narinx, E. Baise. C. Gcrday, Subtilisin from psychro- 
philic antarctic bacteria: characterization and site-directed 
mutagenesis of residues possibly involved in the adaptation 
to cold. Protein Eng. 10 (1997) 1271-1279. 
K.E. Neet, D.E. Koshland Jr., The conversion of serine at 
the active site of subtilisin to cysteine: a 'chemical muta- 
tion', Proc. Natl. Acad. Sci. USA 56 (1966) 1606-161 1. 
K.E. Neet, A. Nanci. D.E. Koshland Jr., Properties of thi- 
ol-sublilism. The consequences of converting the active ser- 
ine residue to cysteine in a serine protease, J. BioJ Chem 
243 (1968) 6392-6401. 

J.E. Ness. M. Welch. L. Giver, M. Bueno, J.R. Cherry, 
T.V. Borchert, W.P. Stemmer, J. Minshull. DNA shuffling 
of subgenomic sequences of subtilisin, Nat. Biotechnoi 17 
(1999) 893-896. 

TP. O'ConncIl, R,M. Day, E.V. Torchilin. VV.W. Bachov- 
chin. J.G. Malthouse, A '-^C-NMR study of the role of 
Asn-155 in stabilizing the oxyanion of a subtilisin tetrahe- 
dral adduct, Biochem. J. 326 (1997) 861-866. 
D. Oxender (Organizer), UCLA Symposium on Protein 
Structure, Folding and Design, J. Cell. Biochem. SuppI 
9B (1985) 91-145. 

C.N. Pace, G.R. Grimsley. J. A. Thomson. B.J. Barnett. 
Conformational stabilities and activity of ribonuclease Tl 
with zero, one and two intact disulfide bonds, J. Biol. 
Chem. 263 (1988) 11820-11825. 

C. Paech, D.W. Goddette. T. Christiansen, CR. Wilson. 
Unusual ligand binding at the active site domain of an 
engineered mutant of subtilisin BL, Adv. Exp. Med Biol 
379 (1996) 257-268. 

M.P. Pantoliano. R.C. Ladner, Computer Designed Stabi- 
lized Proteins, United States: Gcnex Corp.. 1987. 
M.W. Pantoliano, Proteins designed for challenging envi- 
ronments and catalysis in organic solvents, Curr. Opin. 
Struct. Biol. 2 (1992) 559-568. 

M.W. Pantoliano, R.C. Ladner, P.R Bryan. M.L. Roll- 
encc, J.F. Wood, T.L. Poulos, Protein engineering of sub- 
tilisin BPN': stabilization through the introduction of two 
cysteines to from a disulfide bond. Biochemistry 26 (1987) 
2077-2082. 

M.W. Pantoliano. M. Whitlow. J.F. Wood, S.W. Dodd. 
K.D. Hardman. M.L. Rollence. P.N. Bryan. Large in- 
creases in general stability for subtilisin BPN' through in- 
cremental changes in the free energy of unfolding. Biochem- 
istry 28 (1989) 7205-7213. 

M.W. Pantoliano, M. Whitlow. J.F. Wood, M.L. Rollence, 
B.C. Finzel, G. Gilliland. T.L. Poulos. P.N. Bryan, The 
engineering of binding affinity at metal ion binding sites 
for the stabilization of proteins: subtilisin as a tesr case. 
Biochemistry 27 (1988) 831 1-8317. 



[113) 
[114) 



flI5] 
[116] 



[117] 



[118] 

[119] 

[120] 
[121] 



[122] 



ri23J 



[1 24] 



[125] 
[126] 



[1271 



[128] 



[129] 



[130] 



[131] 



J.J. Perona, C.S. Craik, Structural basis of substrate specif- 
icity m the serine proteases. Protein Sci. 4 (1995) 337-360. 
E.B. Peterson, D. Hilvert, Nonessential active site residues 
modulate selenosubtilisin's kinetic mechanism. Biochemis- 
try 34 (1995) 6616-6620. 

M, Philipp, M.L. Bender, Kinetics of subtilisin and thiol- 
subtilisin, Mol. Cell. Biochem. 51 (1983) 5-32. 
M. Philipp. LH. Tsai, M.L. Bender. Comparison of the 
kinetic specificity of subtilisin and thiolsubtilisin toward 
/i-a!kyl /}-nitrophenyI esters. Biochemistry 18 (1979) 3769- 
3773. 

E. Plettner. G. DeSantis, M.R. Stabile. J.B. Jones, Modu- 
lation of esterase and amidase activity of subtilisin Bacillus 
lentus by chemical modification of cysteine mutants. J. Am. 
Chem. Soc. 121 (1999) 4977^981. 

D.C. Poland, H.A. Scheraga. Statistical mechanics of non- 
covalent bonds in polyamino acids. VIU. Covalent loops in 
proteins. Biopolymers 3 (1965) 379-399. 
L. Polgar, M.L. Bender, The reactivity of thiol-subtilisin, 
an enzyme containing a synthetic functional group. Bio- 
chemistry 6 (1967) 610-620. 

L. Polgar, M.L. Bender, Chromatography and activity of 
thiol-subtilisin. Biochemistry 8 (1969) 136-141. 
S.N. Rao. U.C. Singh, P.A. Bash. P.A. Kollman, Free en- 
ergy perturbation calculations on binding and catalysis 
after mutating Asn 155 in subtili.sin. Nature 328 (1987) 
551-554. 

M. Rheinnecker. G. Baker. J. Eder, A.R. Fcrshi, Engineer- 
inc a novel specificity in subtilisin BPN', Biochemistry I"* 
(1993) U99-I203. 

M. Rheinnecker. J. Eder, PS. Pandcy. A.R. Fershl. Var- 
iants of subtihsin BPN' with altered specificity profiles. Bio- 
chemistry 33 (1994) 221-225. 

M.L. Rollence, D. Filpula, M.W. Pantoliano. P.N. Bryan, 
Engineering thermostability in subtilisin BPN' by in vitro 
mutagenesis, CRC Crit. Rev. Biotechnoi. 8 (1988) 217-224. 
l.S. Rombauer, The Joy of Cooking. 1931. 
B. Ruan. Folding of Subtilisin: Study of Independent Fold- 
ing and Pro-domain Catalyzed Folding, Ph.D. Dissertation. 
University of Maryland, College Park. MD. 1998. 
B. Ruan. J. Hoskins. P.N. Bryan, Rapid folding of calcium- 
free subtilisin by a stabilized pro-domain mutant. Biochem- 
istry 38 (1999) 8562-8571. 

B. Ruan. J. Hoskins. L. Wang, P.N. Bryan. Stabilizing the 
subtilisin BPN' pro-domain by phage display selection: 
how restrictive is the amino acid code for maximum protein 
stability? [In process citation]. Protein Sci. 7 (1998) 2"t45- 
2353. 

.■\.J. Russell, A.R. Fersht. Rational modification of enzyme 
catalysis by engineering surface charge. Nature 328 (1987) 
496-500. 

.■\,J. Russell, P.G. Thomas, A.R. Fersht, Electrostatic ef- 
lects on modification of charged groups in the active site 
cleft of subtilisin by protein engineering, J. Mol. Biol. I9t 
(1987) 803-813. 

S. Ruvinov, L. Wang. B. Ruan. O. Almog. G. Gilliland. E. 



220 



PM, BryanlBiochimica et Biophysica Acta I54S (2000) 203-222 



[132] 



[133] 



[134] 



[135] 



[136] 



[137] 



[138] 



[139] 



[140] 



[141] 



[142] 



[143] 



[144] 



[145] 



[146] 



Eisenstein, P. Bryan, Engineering the independent folding 
of the subtilisin BPN' prodomain: analysis of two-state 
folding vs. protein stability. Biochemistry 36 (1997) 
10414-10421. 

A. Sattler, S. Kanka. K.H. Maurer, D. Riesner, Thermo- 
stable variants of subtilisin selected by temperature-gradient 
gel electrophoresis, Electrophoresis 17 (1996) 784-792. 
R. Schulein. J. Kreft, S. Gonski, W. Goebel, Preprosubti- 
lisin Carlsberg processing and secretion is blocked after 
deletion of amino acids 97-101 in the mature part of the 
enzyme, Mol. Gen. Genet. 227 (1991) 137-143. 
P. Sears. M. Schuster, P. Wang. K. Wine, C.-H. Wong, 
Engineering subtilisin for peptide coupling: studies on the 
effects of counterions and site-specific modifications on the 
stability and specificity of the enzyme, J. Am. Chem. Soc. 
116 (1994) 6521-6530. 

S. Shafikhani, R.A. Siege!, E. Ferrari, V. Schellenberger, 
Generation of large libraries of random mutants in BacUlits 
siibtilis by PCR-based plasmid muhimerization, BioTechni- 
ques 23 (1997) 304-310. 

Z. Shao, F.H. Arnold, Engineering new functions and al- 
tering existing functions, Curr. Opin. Struct. Biol. 6 (1996) 
513-518. 

Z. Shao, H. Zhao, L. Giver, F.H. Arnold, Random-priming 
in vitro recombination: an effective tool for directed evolu- 
tion. Nucleic Acids Res. 26 (1998) 681-683. 
U. Shinde, X. Fu, M. Inouye, A pathway for conforma- 
tional diversity in proteins mediated by intramolecular 
chaperones, J. Biol. Chem. 274 (1999) 15615-15621. 
U. Shinde, M. Inouye. Folding mediated by an intramolec- 
ular chaperone: autoprocessing pathway of the precursor 
resolved via a substrate assisted catalysis mechanism, 
J. Mol. Biol. 247 (1995) 390-395. 

U. Shinde, M. Inouye, Folding pathway mediated by an 
intramolecular chaperone: characterization of the structur- 
al changes in pro-subtilisin E coincident with autoprocess- 
ing. J. Mol. Biol. 252 (1995) 25-30. 

U. Shinde, M. Inouye, Propeptide-mediatcd folding in sub- 
tilisin: the intramolecular chaperone concept. Adv. Exp. 
Med. Biol. 379 (1996) 147-154. 

U.P. Shinde, J.J. Liu, M. Inouye, Protein memory through 
altered folding mediated by intramolecular chaperones. Na- 
ture 389 (1997) 520-522. 

R.J. Siezen, J. A.M. Leunissen, U. Shinde, in: U. Shinde, 
M. Inouye (Eds.). Intramolecular Chaperones and Protein 
Folding, R.G. Landes, Austin, TX, 1995, pp. 233-256. 
S.B. Sorensen. L.M. Bech, M. Meldal, K. Breddam, Muta- 
tional replacements of the amino acid residues forming the 
hydrophobic S4 binding pocket of subtilisin 309 from Ba- 
cillus lenttis. Biochemistry 32 (1993) 8994-8999. 
R. Sowdhamini, N. Srinivasan, B. Shoichei. D.V. Santi, C. 
Ramakrishnan. P. Balaram, Stereochemical modeline of 

■•tar- 

disulfide bridges. Criteria for introduction into proteins 
by site-directed mutagenesis. Protein Eng. 3 (1989) 95-103. 
C.E. Siauffer, D. Etson, The effect on subtilisin activity of 



(1969) 



'i'l 



oxidizing a methionine residue, J. Biol. Chem. 244 
5333-5338. 

[147] MJ. Sternberg, F.R. Hayes, A.J. RusseU, P.G. Thoma^^ 

A. R. Fersht, Prediction of electrostatic effects of engineer " 
ing of protein charges, Nature 330 (1987) 86-88. \ '- 

[148] M.J.E. Sternberg, F.R.F. Hayes, A.J. Russell. P.O. Th^ 
mas. A.R. Fersht, Prediction of electrostatic effects of cq, 
gineering of protein charges. Nature 330 (1987) 86-88. 

[149] S. Strausberg, P. Alexander, D.T. Gallagher. G. Gilliland^ 

B. L. Barnett, P. Bryan, Directed evolution of a subtilisin 
with calcium-independent stability. Bio/technology \i 
(1995) 669-673. 

[150] S. Strausberg, P. Alexander, L. Wang, D.T. Gallagher, G, 
Gilliland, P. Bryan, An engineered disulfide crosslink acc^ 
erates the refolding rate of calcium-free subtilisin by 850- 
fold. Biochemistry 32 (1993) 10371-10377. 

[151] S. Strausberg, P. Alexander, L. Wang, F. Schwarz, K 
Bryan, Catalysis of a protein folding reaction: thermody- 
namic and kinetic analysis of subtilisin BPN' interactions 
with its propeptide fragment. Biochemistry 32 (1993) 8112^ 
8119. 

[152] R. Syed. Z,P. Wu, J.M. Hogle, D. Hilvert, Crystal structuic 
of selenosubtilisin at 2.0-A resolution. Biochemistry 32 
(1993) 6157-6164. 

[153] S. Taguchi, A. Ozaki, H. Momose, Engineering of a cold- 
adapted protease by sequential random mutagenesis and a 
screening system. Appl. Environ. Microbiol. 64 (1998) 492- 
495. 

[154] S. Taguchi, A. Ozaki, T. Nonaka, Y. Mitsui, H. Momose, 
A cold-adapted protease engineered by experimental evolu- 
tion system, J. Biochem. 126 (1999) 689-693. 

[155] H. Takagi, S. Arafuka, M. Inouye. M. Yamasaki, The ef- 
fect of amino acid deletion in subtilisin E, based on struc- 
tural comparison with a microbial alkahne elastase, on its 
substrate specificity and catalysis, J. Biochem. Ill (1992) 
584-588. 

[156] H. Takagi. T. Maeda, I. Ohtsu, Y.C. Tsai, S. Nakamori, 
Restriction of substrate specificity of subtilisin E by intro- 
duction of a side chain into a conserved glycine residue, 
FEBS Lett. 395 (1996) 127-132. 

[157] H. Takagi. Y. Morinaga. H. Ikemura. M. Inouye, Mutani 
subtilisin E with enhanced protease activity obtained by 
site-directed mutagenesis, J. Biol. Chem. 263 (1988) 
19592-19596. 

[158] H. Takagi, Y. Morinaga, H. Ikemura. M. Inouye, The role 
of Pro-239 in the catalysis and heat stability of subtilisin E. 
J. Biochem, 105 (1989) 953-956. 

[159] H. Takagi, I. Ohtsu, S. Nakamori, Construction of novt! 
subtilisin E with high specificity, activity and productivity 
through multiple amino acid substitutions. Protein Eng. 
(1997) 985-989. 

[160] H. Takagi, T. Takahashi, H. Momose, M. Inouye. Y. Ma^ 
da, H. Matsuzawa. T. Ohta, Enhancement of the therm^v 
stability of subtilisin E by introduction of a disulfide bond 
engineered on the basis of structural comparison with • 



[165] 



thermor 
6874-68 
H. Taki 
dom mu 
isolation 
Protein . 
K. Tak:- 
streptom 
inhibitor 
6190. 

I ||63] T. Tana 
Miura, 1 
type seri; 
Pl-substi 
^; itor, Bio: 
>:qi64J T. Tanak 
of aqua I\ 
%' side chait 
T. Tanak 
signing o 
tilisin-reh: 
1021. 
•?(I66] T. Tangc. 
r Improven 
experimer. 
^ nol. 41 (\ 
r?(!67] A.V. Tep; 
Kclders, } 
stra, Proto 
: PB92 fror. 
consequer 
et, Proteir. 
[168] P.G. Thoi 
dependent. 
Nature 31 
|I69] RJ. Tona-. 
in acyl-sc) 
istry 29 ( I 
[170] K.M. Uln: 
671. 

(171] N. Vasam 
Nagle. D. 
from Bad: 
ing frame 
and matur 

[172] A. Volkov 
ing of pro 
Biol. 262 ( 

1*73] A. A. Volk 
chimeragci 
vivo repair 

1*74] C, von de: 
M.D, Rasr 
Mikkclsen. 
stability ii. 
(1993) 55-0 





P.N. Bryan i Biochmiica et Biophysica Acta J 543 (2000) 20S-222 



221 



44 



(1969) 



engineer. 

' G Tho. 
of tn. 

<6-88. 

Gilliland, 

subtilisin 
ology 13 

lagher, Q 
link acccl. 
n by 850- 

iwarz, p. 
hermody. 
teractions 
')3)8112- 

siructurt 
nistry 32 

)f a cold- 
sis and a 
S)98) 492- 

Momosc, 
tal evolu- 

i. The ef- 
*>n stnic* 
se, on its 
il (1992/ 

:ikamori. 
by iniro- 
■ residue. 

:, Mutani 
ained by 
3 (1988) 

The rok 
nilisin E. 

of novel 
ductivity 

I Eng. 1<* 

Y. Ma^ 
thermo- 
idc boD^ 
1 with » 



thermophilic serine protease, J. Biol. Chem. 265 (1990) 
6874-6878. 

[161] H. Takagi. M. Yamamolo. I. Ohtsu. S. Nakamori, Ran- 
dom mutagenesis into the conserved Glyl54 of subtilisin E: 
isolation and characterization of the revertant enzymes. 
Protein Eng. 11 (1998) 1205-1210. 

(162) K. Takahashi. J.M. Slurtevant, Thermal denaiuration of 
strcpiomyces subtilisin inhibitor, subtilisin BPN', and the 
inhibitor-subtilisin complex. Biochemistry 20 (1981) 6185- 
6190. 

(163] T. Tanaka, H. Matsuzawa. S. Kojima. I, Kumagai, K. 
Miura. T. Ohta, PI specificity of aqualysin I (a sublilisin- 
type serine protease) from Thermits aquaticus YT-I, using 
PI -substituted derivatives of Streptomyces subtilisin inhib- 
itor, Biosci. BiotechnoK Biochem. 62 (1998) 2035-2038. 

|164] T. Tanaka, H. Matsuzawa, T. Ohta, Engineering of S2 site 
of aqualysin I; alteration of P2 specificity by excluding P2 
side chain. Biochemistry 37 (1998) 17402-17407. 

[1651 T. Tanaka, H. Matsuzawa, T. Ohta, Identification and de- 
signing of the S3 site of aqualysin I, a thermophilic sub- 
tilisin-related serine protease, J. Biochem. 125 (1999) 1016- 
1021. 

[166] T. Tanse. S. Taguchi. S. Kojima, K. Miura, H. Momose, 
Improvement of a useful enzyme (subtilisin BPN') by an 
experimental evolution system. Appl. Microbiol. Biotech- 
nol. 41 (1994) 239-244. 

(167) A.V. Teplyakov, J.M. van der Laan, A.A. Lammers, H. 
Kddcrs. K.H. Kalk, O. Misset, L.J. Mulleners, B.W. Dijk- 
stra. Protein engineering of the high-alkaline serine protease 
PB92 from Bacillus alcahphilus: functional and structural 
consequences of mutation at the S4 substrate binding pock- 
et. Protein Eng. 5 (1992) 413-420. 

[168] P.O. Thomas. A.J. Russell. A.R. Fershi, Tailoring the pH 
dependence of enzyme catalysis using protein engineering. 
Nature 318 (1985) 375-376. 

[169] P.J. Tonec. P.R. Carey, Length of the acyl carbonyl bond 
in acyl-serinc proteases correlates with reactivity. Biochem- 
istry 29 (1990) 10723-10727. 

[170] K M. Ulmcr. Protein enuineering. Science 219 (1983) 666- 
671. 

[171] N. Vasaniha. L.D. Thompson. C. Rhodes, C. Banner. J. 
Nagle, D. Filpula, Genes for alkaline and neutral protease 
from Bi/cilhts ann h/it/uefaciens contain a large open-read- 
ing frame between the regions coding for signal sequence 
and mature protein. J. Bacteriol. 159 (1984) 811-819. 

■|I72] A. Volkov, F. Jordan. Evidence for intramolecular process- 
ing of prosubtilisin sequestered on a solid support, J. Mol. 
Biol. 262 (1996) 595-599. 

I'^3] A.A. Volkov, Z. Shao, F.H. Arnold, Recombination and 
chimeragenesis by in vitro heteroduplex formation and in 
vivo repair. Nucleic Acids Res. 27 (1999) el 8. 

J'74] C, von der Osten, S. Branner, S. Hastrup, L. Hedeuaard, 
M.D. Rasmussen. H. Bisgard-Frantzen. S. Carlsen, J.M. 
Mikkelsen. Protein engineering of subtilisins to improve 
stability in detergent formulations. J. Biotechnol. 28 
(1993) 55-68. 



[175] G. Voordouw. C. Milo, R.S. Roche, Role of bound calcium 
in thermostable, proteolytic enzymes. Separation of intrin- 
sic and calcium ion contributions to the kinetic thermal 
stability. Biochemistry 15 (1976) 3716-3724. 
[176] L. Wang, B. Ruan, S. Ruvinov, P.N. Bryan. Engineering 
the independent folding of the subtilisin BPN' pro-domain: 
correlation of pro-domain stability with the rate of subtili- 
sin folding. Biochemistry 37 (1998) 3165-3171. 
[177] L. Wang, S. Ruvinov, S. Strausberg. T.D. Gallagher, G. 
GiUiland. P. Bryan, Prodomain mutations at the subtilisin 
interface: correlation of binding energy and the rate of 
catalyzed folding. Biochemistry 34 (1995) 15415-15420. 
[178] P.P. Wangikar. J.O. Rich, D.S. Clark, J.S. Dordick. Prob- 
ing enzymic transition state hydrophobicities. Biochemistry 
34 (1995) 12302-12310. 

[179] J. A. Weils. Additivity of mutational effects in proteins. Bio- 
chemistry 29 (1990) 8509-8517. 

[180] J.A. Wells, B.C. Cunningham. T.P. Graycar, D.A. Estell. 
Importance of hydrogen-bond formation in stabilizing the 
transition state of subtilisin, Philos. Trans. R. Soc. London 
317 (1986) 415-423. 

[181] J.A. Wells, B.C. Cunningham. T.P. Graycar, D.A. Estell. 
Recruitment of substrate-specificity properties from one en- 
zyme into a related one by protein engineering. Proc. Natl. 
Acad. Sci. USA 84 (1987) 5167-5171.^ 

[182] J.A. Wells, E. Ferrari. D.J. Henner, D.A. Estell, E.Y. Chen. 
Cloning, sequencing and secretion of Bacillus antyloliquefa- 
ciens subtilisin in Bacillus suhtilis. Nucleic Acids Res. 1 1 
(1983) 7911-7925. 

[183] J.A. Wells, D.B. Powers. In vivo formation and stability of 
engineered disulfide bonds in subtilisin, J. Biol. Chem. 261 
(19S6) 6564-6570. 

[184] J.A. Wells, D.B. Powers. R.R. Bott. T.P. Graycar, D.A. 
Estell, Designing substrate specificity by protein engineering 
of electrostatic interactions, Proc. Natl. Acad. Sci. USA 84 
(1987) 1219-1223. 

[185] A.K. Whiting. W.L. Peticolas, Details of the acyl-enzyme 
intermediate and the oxyanion hole in serine protease cat- 
alysis. Biochemistry 33 (1994) 552-561. 

[186] C.-H. Wong. S.-T. Chen, W.J. Hennen, J.A. Bibbs. Y.-F. 
Wang. J.L.-C. Liu, M.W. Panioliano, M. Whitlow. P.N. 
Bryan, Enzymes in organic synthesis: use of subtilisin and 
a highly stable mutant derived from multiple site-specific 
mutations, J. Am. Chem. Soc. 112 (1990) 945-953. 

[187] C.H. Wong, Enzymatic catalysts in organic synthesis. Sci- 
ence 244 (1989) 1145-1152. 

[188] C.H. Wong. G,J. Shen, R.L. Pederson, Y.F. Wang. W.J. 
Hennen. Enzymatic catalysis in organic synthesis. Methods 
Enzymol. 202 (1991) 591-620. 

[189] L. You. F.H. Arnold. Directed evolution of subtilisin E in 
Bacillus suhiilis to enhance total activity in aqueous dimeih- 
ylformamide. Protein Eng. 9 (1996) 77-83. 

[190] H. Zhao. F.H. Arnold. Functional and nonfunctional mu- 
tations distinguished by random recombination of homol- 
ogous genes. Proc. Natl. Acad. Sci. USA 94 (1997) 7997- 
8000. 



P.N, Bryan I Biochimica et Biophysica Acta J 543 (2000) 203-222 



[191] H. Zhao, F.H. Arnold, Dirccied evolution converts subtili- 
sin E into a functional equivalent of thcrmiiase Protein 
Eng. i: (1999) 47-53. 

[192] H. Zhao. L. Giver, Z. Shao, J. A. Affhohcr. F.H. Arnold. 
Molecular evolution by staggered extension process (SiEP) 
in vitro recombination [see comments], Nat. Bioiechnol. 16 
(1998) 258-261. 



[193] H. Zhao. Y. Li. F.H. Arnold, Strategy for the directed 
evolution of a peptide licase. Ann. NY Acad. Sci 799 
(1996) 1-5. 

[194] L. Zhu, Y. Ji, Protein engineering on subtilisin E, Chin. J 
Biotechnol- 13 (1997) 9-15. 




ELSEV! 



-4 




I. Introi 

Lipasi 
or form 
present 
and is cl 
is showi 

Triglycei 

The lipi 
often CN 
pase, ly 
nase, an 
(1.2,27]. 
substrati 
ceride, a 
ceride a^ 
acids ale 
Tor the i* 
^n2 is 
zymes 
overall s 
tility of ! 
^nd e.xhi 
^efinitioi 
which a; 
. ^htis incl 



* Corre- 
^ ?-niail: as 

' son 




Exhibit 8 





BOSTON, MA 



nattuie 

7 April 1988 

Vol. 332 Issue no. 6164 



A view across sand dunes in the Sahara. 
A study of wind-driven sand transport 
in the north-western Algerian Sahara 
identifies a previously unrecognized 
mechanism, page 532. (Photo: Frank 
Lane.) / 



iSWEEK 



ritne-flghting advance 

sing the DNA polymerase 
ain reaction. DNA can now 
typed from a single hair. As 
aLrs are one of the most fre- 

il^\> Hoot i tbm1% Be»t Sbafl 



INDtUIOUHL 




itently found forms of evidence 
■ scenes of crimes the con- 
quences for forensic science 
re considerable, page 543. 



|jg!nebula cycle 

pnty years after they were 
idlcted, a new class of cosmo- 
o^ical Xrray source is discov- 
red. Ring nebula NGC6888 is 
c first, pages 518 and 486. 

eslstance evasion 

bacterial pathogen of the 
^pper plant that has mutated 
6 ev^de host recognition has a 
transpbsable element in a gene 
responsible for the plant's hyper- 
sensitive response, page 541 . 
I' 

,'Greenhouse' gas rising 

Levels of atmospheric methane, 
'a candidate for contributing to 
global warming, are increasing. 
Radiocarbon data suggest that 
over 30 per cent of atmospheric 
lethane is derived from fossil 
carbon, pages 522 and 489. 

Developmental switch 

The switch from mitosis to 
meiosis in yeast has been pinned 
down to the inhibition of a 
protein kinase by a product of a 
jgene specifically activated in 
diploid cells, page 509. 



Lochs more boniiie 

Have reductions in sulphur 
emissions and acid rain deposi* 
tion in the past decades led to 
improvements in the environ- 
ment? Chemical and diatom 
analyses of a pair of Scottish 
lochs give sonie of the answers,- 
page 530. 

Brain power 

Electron microscopy shows the 
brain protein MAP IC, thought 
td be responsible for the trans- 




port of cytoplasmic organelles, 
to be structurally similar to 
dynein, the force-generating 
protein in cilia and flagella. Sec 
page 561 . 

Titanic collisions 

Earthly laboratory experiments 
provide evidence to support the 
idea that the nitrogen gas pres- 
ent on Saturn's moon Titan 
formed from ammonia as a 
result of high-velocity collisions 
with meteors, page 520. 

Great Lakes battle 

Despite an 'invasion' from the 
north by a voracious predator, 
the factors limiting the algal 
biomass in Lake Michigan seem 
to be related to nutrient supply, 
not a prey/predator balance, 
pages 537 and 491 . 

Guide to Authors 

Facing page 568. 




arur^ (ISSN ()t)2H-lHt.U>) i«puhlishcJ weekly on Thursday, cxcenl the last week in December, 
y Macmillan Magazines Lid(-t Liiile Essex Street. London WC2R 3LF). Annual subscriptiun 
or USA and Canada USS25lt (insiiiutiunal/corporatc), USSI23 (individual making personal 
ymcnt). USA and Canadian orders to: Nature, Subscnniion Dept. PO Box 7663, Teancck. NJ 
666-9837. USA. Other ordcn to f\fafure. Brunei Road. Basinuioke. Hants RG21 2XS. UK. 
' nd class postage paid ul New York. NY 100)2 and addiiionaTmailtngofHces. Authorization 
^ocopy material for internal or persona) use . or iniernat or personal use of specific clients, 
nted ny Namrt to libraries and others registered with the Copyright Clearance Center 
yrVwuaciional Reporting Service, provided the base fee ofS 1 .w a copy plus $0. 10 a page 
_ d direct to CCC, 21 Congress Street, Salem. MA 01970, USA. Ideniincation code for 
^nlrri* OiDQS-0836^ $1.00 $0.10. US Postmaster send address chances to: Nature, 65 
leecker Sireci. New York. NY KXll 2. Published in Japan by Nature Jiipun K.K.. Shin-Mttxukc 
ldg.36IchIgayaTamuchi. Shin juku-ku. Tokyo 1 62. Japan. © If UK Macnultan Mugazines Ltd. 



OPINION 



A united Europe in 1992? ■ Squaring the circle 
Windows copyright? 

NEWS 



RY OF MEDI01NB= -;?viVI;m 



US-Japan agreement ■ AIDS drug ■ Australian reform 
Space settlement ■ Congress and NIH ■ Armenia/ 
Azerbagan ■ Sequencing yeast genome ■ UK wind poweir' ... , ..^^ 
■ Simultaneous classroom ■ UK defence spending ■ . • ^1 **' ': \ f ^ = 
Superconductors ■ Sea to pond in Japan ■ Leningrad . v^vKjl' 
library fire ■ Correspondence * 475-482 i - 1 -^;^ 



NEWS AND VIEWS 




Is the Earth alive or dead? David Lindley 483 
The ras oncogene: A structure and some function . ^= 

Irving S.Sigal . 485 

Structure of a ring nebula J C Raymond 486 

. Cretaceous unity and diversity Henry Gee «. c4!t:^:^r_487;:i;.it *ii^^ 

Topology: Mysteries of four dimensions 
John DS Jones 

Sources of increased methane G I Fearman & 
P J Fraser 489 
Developmental neurobiology: The milieu is the message 



488 



N Joan Abbott 

Why Lake Michigan is not green Robert M May 
Obituary: Sewall Wright (1889-1988) 
John Maynard Smith 

Particle physics; New phase for an old theory? 
R D Peccei 

Daedalus: Salt of the earth 



490 

■ 491 ■ ' ^'^^f.•7^.. 

r 

492 

' 493 ; ! . jx^^ 



SCIENTIFIC CORRESPONDENCE 

Has the north-east Atlantic become rougher? 
D J T Carter & L Draper 

Assumptions about suicidal behaviour of aphids 

M K McAllister &BD Roitberg 

A new parameter for sex education H Sies 

What are the masses of elementary particles? I J Good 

BOOK REVIEWS 



494 

494 
495 
495 



Who Got Einstein's Office? Eccentricity and Genius at the 
Institute for Advanced Study by E Regis Daniel J Kevles 
Seventy-five Years in Ecology: The British Ecological 
Society byJ Sheail Kenneth Mellanby 
Molecules and Morphology in Evolution: Conflict or 
Compromise? C Patterson ed Vincent Sarich 
Biogeography and Plate Tectonics byJ C Briggs Barry Cox 
■ Cell-to-Cell Communication WCDe Melto ed 
Daniel Goodenough 

ARTICLES 



497 
498 
499 

500 



Response of a general circulation model to a prescribed 
Antarctic ozone hole 

J T Kiehl, B A BoviUe & B P Bricgleb 501 
Gas compression and jet formation in cavities collapsed by 
a shock wave 

J P Dear, J E Field & A J Walton 505 

A specific inhibitor of the rani *' protein kinase regulates 

entry into meiosis in Schizosaccharomyces ponibe 

M McLeod & D Beach 509 

Contcntii continued ► 



: t 



564 



LETTERSTONATURE 



NATURE VOL. 332 7 APRIL 1988 




-9 



-8 -7 
tog [inhibitor] 



Fig. 2 Inhibition of *"l-labcHed pooled human IgG binding to 
high affinity Fc receptors (FcRl) on U937 cells by monomcric 
mouse lgG2b immunoglobulins, (O), Wild type lgG2b; (•), 
Glu235-^Lcu mutant IgG2b. For methods see Fig. 3 legend. 



o 




Fig. 3 Scatchard plot of *"l-labellcd mutant Glu235 Leu mouse 
IgG2b binding to high affinity receptors (FcRI) on U 937 cells, r. 
Number of moles of *«I.(Glu 235-^ Uu) mouse IgG2b antibody 
bound per mole of cells. A. Concentration of free 1-mutant 
lgG2b^ The number of receptors per cell is lower than those 
previously reported'* **, but a Scatchard analysis of l-labelled 

. pooled human IgG binding to the U 937 cells was similar (not 
shown); The diminished values for receptor number may be caused 
by growing U 937 lo high cell concentrations (0.9 x 10 per ml>t- 
Metbods. The IgG-FcRI binding assay was essentially as previously 

' ' described^ except that after introduction of water-immiscible oil 
to the equilibrium mixture followed by rapid centrifugation. the 
pelleted cells (bound »"l-lgG) and medium (free '"I-IgG) were 
separated by slicing through the tube within the oil layer. 

(cleaved between 233 and 234)'« resulted in a loss of binding 
to human FcRI*'-^**. although in these two cases the two CH2 
domains of the antibody are no longer tethered together by the 
hinge disulphides. In the alignment of ref. 12, antibodies with 
substitutions at residues 231 and 233 still bind tightly to FcRI, 
but those with changes at residue 234 have a reduced affinity. 
Furthermore residues 236-238 are completely conserved, except 
in mouse IgGl and human IgG2. which do not bind to human 
FcRI. Much of the link, in particular residues 234-238. may 
therefore be required for binding to human FcRI. 

The hinge link is mobile in the crystallographic structure of 
human Fc^' and is accessible to proteolytic attack. Thus papain 
cleaves between residues 233 and 234 in mouse IgG2a and 
lgG2b**i pepsin between residues 234 and 235 in human IgGl 
and residues 238 and 239 in mouse IgGl^; thermolysin between 
residues 234 and 235 iii human IgGl". The facile proteolysi? 



of several IgG isotypes in this region may simply reOect the 
underlying design of the FcRI binding site. The site appears to 
be accessible and Bexible and would^ permit, for example, a 
hinge dislocation on binding to FcRP*. ^ 

In conclusion, our results suggest that the hmge hnk. either 
as a single flexible strand or paired with the strand from the 
other heavy chain, is a major determinant in bindmg of antibody 
to FcRI. and we would predict that changing Uu 235 for 
glutamic acid (and perhaps other side chains) would destroy 
the interaction of human IgGl or lgG3 with FcRI. The possibil- 
ity of turning on and off the interaction of antibody with human 
FcRI could help dissect the role of this receptor in phagocytosis 
and Jell mediated lysis and in antibody therapy. Purtbermorc 
in imaging of solid tumours, eliminating interactions with FcRI 
could help reduce background due to antibody binding lo cells 
with high affinity receptors in the lymphatics, liver and spleen. 

We thank M. S. Neuberger for the mouse IgG2b expression 
vector. M. S. Neuberger and C. Milstein for advice, and M. 
Clark for comments on the draft. This work was supported by 
the Medical Research Council and the Wellcome Trust. D.R.Bi 
is a Jenner Fellow of the Lister Institute of Preventive Medicine. 

Received I J January; owreptcd 3 March 1988. 
I. Burton. D. R- Motee. tmmun. 12, 161-206 (1985). m Try lt<ni\ l 

3. Shen. U. Guyre. P. M. & F.ngcr. M. W. 

4. Graziano, R. R A Fangcr. M. W. / /mm»n. 139. 3536-3541 (»M7) 

5. Karpoviky. B.. T.tus. J. A.. Siephany. D. A. & Segal » ^ yj'/^ /^rd 1 60, 1 686- 1701 ( 1984). 

6. Anderwn. C. L. & Looncy. R- >■ ^Way 7. 264-266 (1987). 

7. Frangione. 8. A Milscdn, C. Nawrt 216. 939-9^' , . „ «i cr; (19*41 

8. Woor J. M.. Nik Jaafar. M., JeBeri.. R. & Burton. D. R. Mal«. Immun. 21, 523-527 (19MJ. , 

9. Leatherbarrow. R. J. rl at Mottc Immun. 22, ,172(1986) 
10. Partridge. U J., woof. J. M..Jtfltri5. R.A Bunon.D.K Molec fmmunl^^^^ 

11 Klein M. e» P*oc. natn. Acod. Set U5.A 78. 524-528 (1981). ;,„,.q«i t 

11: wS.U M. Partridge. L J., Jcncris.!^* Burton. D.R.^ iT«2oX 

13. Ncul>erger. M. S. & William*. C, T- Phil Trans, fff//-'7; ffigj" 

14. Andenon. C. U & Abraham. G. N. / Immun. 125 2735-"41 (1980). 

15. Kurtander. R. J, & Batkcr. J. / din, Invrzt 69, 1-8 (1982) ^ ^ „ i Imm-n. 129 '* 

16. Fric. L F.. Hall. R. P. Uwlcy. T. J.. Crabirce. G. R. St Frank. M. M. / Immun. 129.^ 

17. HJ^eybSl! *.*m"4 Sianworth. D. R. Immunology "'-fjj* < 

18. Franku.. T. W Binhtein. B. K. Biochemistry njij"-*" (1978). 

19. Ratdiffe. A. A Stanworth. D. R. Immunology 50. 93-100 O'"'- 

20. McCool. D.. Birahtein. B. K. & Paimer. R. H. / Immun. 135, 1973-1980 (I98S). 

21. Deiscnhofer, J. Biochemistry 20, 2361-2370 (1981). 

22 Fmngione. B. & MiUlein. C. /. motec. Biol 33. 893-906 (1968). 

23 Svatii. J. A Milstein. C. Nature 22», 930-935 (1970). 

24. Burton, D. R. Immun. Today 7. J'''*^^ „ ,,o«v 

25. Carter. P., BedoucIIc. H. A Winter. G. NucMcAdds R*s. 13, ^^31-4443 t ™)- 

26. Raychiudhuri. G.. McCool, D. A Panler. R. H. Motec Immun. 22, 1009-1019 (1985). ■ 



Dissecting the catalytic triad of 
a serine protease 

Paul Carter & James A. Wells 

Department of Biomolccular Chemistry, Gcncntcch Inc., 
460 Point San Bruno Boulevard. South San Francisco, 
CaUfornia 94080» USA 




' ^ I 

Serine proteases are present In virtually all organisms and function 
both Inside and outside the ceil'; they exist as two famlllw, tM 
♦trypsin-like' and the *subtllisln-like\ that have IndependenUi 
evolved a similar catalytic device^ characterized by the Ser, HisJ 
Asp triad, an oxyanion binding site, and possibly other deterj 
minants that stablliie the transition state (Fig. l)'^. For BflcOW 
amj^loliqu€faciens subtillsin, these functional elements impart J 
total rate enhancement of at least 10» to 10'<» times the nofl 
enzymatic hydrolysis off amide bonds. We have examined tM 
catalytic Importance and interplay between residues within thi 
catalytic triad by Individual or multiple replacement wltf 
alanlneCs), using sltenlirected mutagenesis*- of the clonrf 
B. an^Uqaefaciens subtllUIn gene'. Alanine subsUtutions weH 
chosen to minimize unfavourable steric contacts and to avol 
imposing; new charge Interactions or liydrogen boqds frog 



Kinetic parameters of mutant subtilisihs with tKc substrate /V-succihyl-L-Ala-L-AIa-L-Pro-L-Phe-p-nitroanilidc at pH 8.60 





f - , -jf.:. .-■.i> <^*'^*jn'i^t*^S 



332 7 APRIL 1988 

i-^:. - 1 ■ : 



r-^ tE?TtRS'TG NATURE 



■ \ . 



Active site configuration 
Ser221 . His64 Asp32 



i?221 A , 
:H64A 
'; D32A 
24C:b32A:H64A 
S24C:H64A:S221A 
S24C:D32A:S221A 
'S24C:D32A:H64A: 



+ 
+ 

+ 
+ 
+ 



+ 
+ 
+ 



+ 



S221A 



(4.4^0:0x10* 
(5.9±0.2)x 10' 
(3.4±0.1)x.l0,-*. 
(3.8 ± 0.2) X 10-' 
(2.3±0.2)xl0'^ 
(2.6±0.l)xl0-'* 
(2.8±0.2)xld-' 
(2.8±0.1)xl0-' 
(3.0±0.1)x 10"' 



180±10 
220 ±20 
420 ±40 
390 ±50 
480 ±80 
270 ±50 
290±40 
310±40 
230 ±20 



k^jK^ (s-*M-») 

(2.5±0.1)xl0* 
(2,7 ± 0.2) XI O' 
(8.2±0.6)xl0'' 
(9.6±1.0)X10-' 

4.7 ±0.7 
(9.4±1.6)xl0-V 
(9.6±1.3)xl0-* 
(9.2±0.9)xl0~^ 
(1.3±0.1)xi0-* . 




Jcc^(routant) 
^t(S24C) 

0.74 ±0.01 
1 



. 1 ^i/rUt 

(5.8±o.i>;xib':'3<ift; 

(6.4±0.2)xlO~T:: . 
(3.8 ±0.2) x 10-'.: ' 
(4.3 ± 0.1) X 10-* 
(4.8 ±0.2) X 10-' 
(4.8 ±0.1) X 10-' 
(5.1 ± 0.1) X 10"' 




No enzyme 



none 



(l.l±0.1)xlO 



(Ter 



1-8 



Jtca,(S24C) 

(1.9±0.1)xl0"*** 




< Mutants arc abbreviated by the single-letter code for the wild-type amino acid followed by its codon position and the amino acid replacement; 
multiple mutants arc designated by listing single mutant components separated by^colons (for example, double mutant Ser24 to Cys. Ser221 to Ala 
is designated S24C:S221A), Construction of the mutants S24C and H64A and the double mutant S24C: H64A was as described The mutations 
P32A and S24C were constructed simultaneously using a 48-mer oligonucleotide''^ and the S221 A mutant was constructed by cassette mutagenesis". 
tThe remaining multiple mutants were constructed by 3-way ligations using a 6 kb BcoRl/BamH} fragment from the vector pSS5 (B. Cunningham, 
;P. Powers, and J. W. unpublished) and two subtilisin fragments from appropriate mutants. Mutant constructions were verified by dideoxy 
^equencing^*^ Mutant plasmids were expressed in a protease deficient strain of B, subtitis, BG2036^'. Rescue of active site mutants by co-culturing 
with the mutant A48E and purification was as described'^. Mutant subtilisins were assayed with the substrate, A/-succinyI-L-AIa-L-Ala-L-Pro-L-Phe-p- 
nitroahilide (Sigma). Six hydrolysis assays were performed simultaneously against substrate blanks in 1 ml 100 mM Tris-HCI (pH 8.60) 4% (v/v) 
dimethylsulphoxide at (25±0.2)*'C using a Kontron Uvikon 860 spectrophotometer. Initial reaction rales were determined from the increase in 
absorbance at 410 nm on release of p-nitroaniline (£410 = 8, 480 M"' cm-*r'. The total substrate concentration in each assay was determined^ from 
A410 after complete hydrolysis. The initial rate data were fitted to the Michaelis-Menten relationship using least squares analysis to deterxnlhe'- 
and V^„. Turnover number {k^i) was calculated from the spectrophotometrically determined en2yme concentration iefJo* — 1.17)". Enzyme 
jconcentrations in the assays were 30-110 jjig ml"* for the active site mutants and 1 M^gml"* for the wild type and S24C enzymes. Catalytic triad 
sidues are represented by ( + ) and Ala replacements by {-), Data arc presented ± standard errors and the. spontaneous hydrolysis rate of 
bstrate under these conditions is shown as ^buffer' 



^bjStitated side chains. In contrast to the effect of mutations In 
iesldues Involved In substrate binding^"*, the mutations in the 
catalytic triad greatly reduce the turnover number and cause only 
minor effects on the Michaells constant. Kinetic analyses of the 
multiple mutants demonstrate that the residues within the triad 
ntepct synerglstically to accelerate amide bond hydrolysis by a 
Seior of -2x10*, 

Subtilisin Is synthesized as a membranc'^associated precursor 
pfeprOsubtilisin)^ When expressed in a protease-deficient 
strain of B. subtilis^ mature amyhliquefaciens subtilisin is 
eiVicnentlv released into the medium after autoproteolytic 
^leavage". Mutagenesis of the catalytic residues in subtilisin 
which essentially inactivates the protease) disrupts this process- 
ig, but processing can be restored by co-culturing the mutants 
ith a smalt amount of a B. subtilis strain (called a 'helper*) 
arbouring an active subtilisin gene*^. We have constructed a 
eries of active site mutants in which the catalytic triad residues 
replaced by alanine in every possible combination (ref. 12, 
le^l). Each mutant also contains a surface-accessible Ser24 
^Gys mutation designated S24C (mutant enzymes are named 
sing the single letter code for amino acids to indicate the 
iibstittitioris made, see Table 1). The S24C substitution permits 
yersible attachment to an activated thiol sepharose column 
ereby eliminating traces of contaminating helper subtilisin 
hich is cysteine-free". 

;The hydrolysis of the substrate (^-succinyl-L-Ala-L-Ala-L- 
p-L-Phe-p-nitroanilide) by most of the active site mutants 
rodiided only small absorbance changes (AA410 of 0.01 to 0.10) 
ver long periods (up to 12 h), yet the data exhibit typical 
ichaelis-Menten saturation behaviour (Fig. 2) with standard 
errors almost as small as those for wild-type subtilisin (Table 
). No detectable loss of catalytic activity occurred even during 
he longest kinetic runs. In addition, the bacliground (non- 
nzymattc) hydrolysis of substrate was ^25% of the catalysed 
te for even the least active enzymes ( Fig. 2). The non-enzymatic 



hydrolysis was subtracted directly from the enzyme assays using 
blank substrate solutions in a double beam spectrophotometer. 

Kinetic analysis of the active site single mutants (Table 1) 
shows that replacement of the catalytic serine, histidine. or 
aspartate causes a drop in turnover number (/Cc«i) by factors of 
2 x10*, 2x10* and 3 x lO^ respectively. The 100-fold; lower 
values of k^ai which result from substitution of Ser221 and His64^ 
compared with Asp32, are consistent with their more central 
role in catalysis (Fig. 1). Each mutation causes a small increase 
in the Michaelis constant (Kj^,) ('-2-rold) which may result from 
slightly altered substrate binding contacts. (Wild-type subtilisin 
has a two-step enzyme mechanism where deacytation is >33 
times faster then acylation*"*, so that is a good approximation 
of the enzyme-substrate dissociation constant (K,)^'. As the 
enzyme mechanism must be changed for at least some of the 
mutants (see below), may be less than K^.) 

Additional mutagenesis of the S24C : S221 A enzyme to replace 
either Asp32, His64 or both, causes essentially no further change 
in fcca, or (Table 1). By comparison, further mutagenesis of 
the S24C : D32A parent enzyme to substitute His64 or both His64 
and Ser221, further reduces Accai 9 and 76-fold, respectively, 
with essentially no change in ^Cp,. These data suggest that His64 
provides a catalytic advantage of -^lO-fold to the S24C: D32A 
enzyme, and that Ser221 provides -'lO-fold advantage to the 
S24C:D32A:H64A enzyme. As with the S24C:S221A family 
of mutants, additional mutations in the S24C : H64A enzyme to 
replace Ser221 or. both Ser221 and Asp32 do not affect J^^. But 
replacement of Asp32 alone in the S24C: H64A mutant to give 
S24C: D32A: H64A, actually increases k^^ 7-fold. Thus, Asp32 
is a liability to the S24C : H64A enzyme, possibly because of an 
unfavourable electrostatic effect upon catalysis (see below). 

The single and multiple mutant analyses show that the 
catalytic effects are non-additive in two ways. First, there is a 
gross discrepancy between the relative drop in kc^i resulting 
from the triple alanine mutant (2 x 10*, Table 1) compared with 



J •.-i-feiiii 



Best 




566 



able Copy 



LETTERSTONATURE 




NATURE VOL. 332 7 APRIL 1988! 



T«b!e 2 Kinetic parameters of mutant subtilisins with the substrate N-succinyl-L.AIa.L-Ala-L-Pro-L-Phe-p-nitroanilide at pH 9.70 



Enzyme 

Wild type 
S24C 

S24C:S221A 

S24C:H64A 

S24C:D32A 

S24C: D32A:H64A 

S24C:H64A:S221A 

S24C:D32A:S221A 

S24C : D32 A : H64 A : S22 1 A 



Active site configuration 
Ser221 His64 Asp32 



+ 
+ 

+ 

+ 



+ 
+ 



+ 

+ 
+ 



(6.3 ±0 
(8.1 ±0 
(5.4±0. 
(I.9±0. 
(1.8±0. 
(1.8±0. 
(5.2 ±0 
(5.9 ±0 
(7.8 ±0. 



(5-») 

1) xlO* 

2) xI0* 

3) xlO"* 
l)xlO-* 
l)xlO-^ 

1) xlO"^ 

2) X 10"* 

3) xlO"* 
3)xlO-* 



440 ±30 
560 ±30 
650 ±90 
1300±150 
1400±120 
460 ±40 
480±60 
460 ±80 
730 ±70 



fc,,yx„(8-'M-M 

(1.4±0.1)xI0* 
(1.5±0.1)xlO* 
(8.4±1.0)x 10"^ 
(1.5±0.2)xlO-* 
(1.3±0.1)xlO' 

3.8 ±0.3 
(|.l±O.OxlO"* 
(|.3±0.2)xlO-» 
(I.l±0.1)xl0"' 



No enzyme 



none 



(2.8 ±0.1) x 10-* 



fc^,.(/>H9.7) 

ik„,(pH8.6) 

1.4±0.1 
1.4±0.1 
1.6 ±0.1 
5.1 ±0.2 

7.8 ±0.4 

6.9 ±0-3 
1.9 ±0.1 
2.1 ±0.1 • 
2.6 ±0.1 

k^n.f(j>^ 9-7) 
k^^„„{p^ 8.6) 
2.5 ±0,1 



Kinetic data were determined as for Table 1 except that 100 mM 3-[cyclohexylamino]-2.hydroxyl.l. propane buffer (pH 9.70) was used. Ionic 
strength was normalized with NaCl. 



the product of the relative effects from ihe three single alanine 
miitants'(^10*'). Second, the double alanine mutants that retain 
singly the catalytic Ser, His or Asp are only a factor of 8, 0.9 
or 0.9 larger in k^^, respectively, than the triple alanine mutant. 
The product of these values (-6) is much below the relative 
/<^t value of 2x10* for wild type (S24C) compared with the 
triple alanine mutant. Thus, non-additive effects are shown either 
by subtraction of catalytic residues relative to wild-type enzyme 
or by addition of single catalytic residues relative to the triple 

alanine mutant. 

Replacement of residues in the catalytic triad with alanines 
necessarily perturbs the enzyme mechanism. In particular, it has 
been observed that in the absence of the catalytic His64 in 
■ subtiiiSiri*^ or the catalytic Asp 102 in trypsin'*-", there iis a 
marked increase in the hydroxide dependence of catalysis 
between pH 8 and 10 compared io the wild-type enzymes. Com- 
parisons of the kinetic parameters for all of the catalytic triad 
mutants at pH 9.70 and pH 8.60 (Table 2) show that those 
retaining Scr221 have a substantially stronger pH dependence 
of /tct (increased 5- to 8-fold) than enzymes containing an intact 
, catalytic triad (increased 1.4-fold), or enzymes lacking Ser221 
: -''lincrcased 1.6- to 2.6-fold), or when compared with the noh^ 
jhi^ enzymatici rate (increased 2.5-foId). For all enzymes the X„ 
values at pH9.70 arc increased between 1.5 and 3.3-fold. Pre- 
liminary evidence suggests that this effect upon K„ may result 
i f ; (at 'least partially) from ionization of Tyrl04, resulting in elec- 
trostatic; repulsion of the P5 succinyl group (see Fig. 3, atrd D. 



Estcll, T. Graycar, D. Powers and J. A. Wells, unpublish^, 
results). ' • -^.-nffi,^] 

For mutants that retain Ser221. the simplest interpretation o ^ 
the data is that they continue to use Scr221 as the catalytic 
nucleophile. The presence of Ser221 provides a catalytic advan- 
tage of -10-fold to the S24C : P32A: H64A enzyme and -100^ 
fold to the S24C : D32A cnzynie. Furthermore, replacing HisS 
in the S24C : D32 A enzyme causes fce.t to drop - lO-fold, suggesU 
ing that His64 functions here to some extent (presumably as a 
proton acceptor for the nucleophilic Ser221). In addition, l»; 
deprotonation of the Ser221 hydroxy! is a prerequisite for 
nucleophilic attack in these mutants, then it is reasonable for 
fc^t to depend on hydroxide ion concentration, as observr 
: (Table 2). Finally, in the absence of His64, the catalytic asparg 
should inhibit deprotonation of Sef221 and have a deletenOu 
electrostatic effect upon fcct. as indeed was found (fc^; for 
S24C:H64A is 10-fold lower than the feet . for 
S24C:D32A:H64A in Table 1). Like wild-type subtihsm, ye 
anticipate the S221A family of enzymes should have a two-step 
enzyme mechanism. For th^e mutants, if deacylation »s "Jte 
determining, it is possible that the K„ values are substantf- 
less than the X. values**. , • 

For the S24C: S221 A family of enzymes, the reaction canno 
proceed by the usual serine acyl-enzyme intermediate. Instedd 
direct attack of water on the scissile peptide bond may occu 
to produce a single tetrahcdral intermediate that collai)S^l 
give the hydrolysed products. Wucleophilic attack by water i 




Fig. 1 Schematic diagram showing 
the rate limiting acylation step in the 
hydrolysis of peptide bonds by sub- 
tilisin. In going from the Michaelis 
enzyme-substrate complex (E-S) to 
the transition state complex (E*S*), 
the proton on Ser221 (darkly shaded) 
is transferred to His 64, thus permitting 
nucleophilic attack on the scissile pep- 
tide bond*"*. The proton is then trans- 
ferred to the amine leaving group to 
generate the acyl-enzyme intermediate 
(E-Ac). Asp32 (as for Aspl02 in 
. trypsin^"*'"'") is believed to position 
: the correct 'tautomer of His64 for 
catalysis in the E-S complex and 
,}t stabilize the protonated form of.His64 
. - in .the . is - S^ complex. Some of the 
^^hydroisch bonds that form in the E • S^ 
' complex are shown by dotted lines. In 
HcS^l^iirfn*' these' steps arc; reversed 
'^Van'd water (as the nucleophile) replaces 
.f^|.i\'-;w 't .^jj^- amine leaving group. 



Asn 




AsnlSS 




Asn 155 



I— H 



Eli' ^ - 
\} 

111564 



.N— H 
H 

Asp32 

Ser221 



His64 






E-Ac -m 




Best Available Copy 




'VOU m 7 APRIL 1988 



LETTtRSTONATURE 



.5 
E 



P 



XT 

insistent with the weak hydroxide dependence of k^^tor the . 

i221At^containing mutants. The lack of.a deleterious elec^ lilrffw'L* fV* 
|Lti:^^i^effect from Asp32 is also consistent with.-aj-ineutralsj, . ; ^ , 
ting ' hucleophile (compare S24G: H64A:S2iiA* with ' 
f24C:D32A:H64A:S221A in Table 1). It is unlikely that the^ 
liiJA' group of enzymes use the other members. of the catalytic^. 
]&al>ecause there is no additional kinetic advantage fo^ includ- ^ ; ; ' 
iV*ttie His64 or Asp32. (Strictly, we cannot be sure that the 
(idual' members of the triad are catalytically inert. We simply 
inptjdetect any catalytic advantage for them over the residual 
jvity resulting from dieterminants unrelated to the triad— see 
»elpw). Preliminary X-ray analysis of the S221A enzyme indi- 
^tes no large structural change except for the Ser221 to Ala^ 
[substitution (R. Bott and M. Ultsch, personal communication): 
[More kinetic and structural data will be necessary however, to 
^substantiate the possible mechanisms discussed above. 

The small values of /c^ai for the active site mutants raise 
'questions regarding protease contaminants or assay artefacts. 
rThe following evidence argues strongly against these 
^■ possibilities. (1) Unlike wild-type subtilisin, the mutant enzymes 
are not inhibited by phenylmethylsulphonyl fluoride. (2) 
1^ Although changes in the values are small for these mutants, 
'many are statistically different from wild type (Tables 1,2). A 
'contamination with helper subtilisin (regardless of. amount) 
^vould give a constant value for the equal to wild type. (3) 
Many of the active site mutants differ significantly from each 
'other in and K^^ at pH 8.6 (T^ble 1), which is inconsistent 
^with a constant contaminant. (4) The mutants differ among 
themselves and wild type in terms of their pH dependence of 
(Table 2), a result inconsistent with a fixed protease con- 
£taminant. (5) Although the kinetic values reported in Tables I 
id -2 are from the same batch of enzyme, most mutant enzymes 
^have been purified more than once. In every repeat case (data 
\not shown) the kinetic values agree within the standard error 
piniits shown (s±l5% for fcc and #C^). even though enzyme 



d^lds varied, and purification protocols were sometimes slightly 
i<^lfiedi (6) The mutants were expressed in an extracellular 
»|x>teai5e deficient strain of B. subtilis^ purified on activated thiol 
lepharose, and judged to be >99% pure by silver-stained SDS- 
^PAGE. Moreover, further purification of the S24C:H64A 
^enzyme by native gel electrophoresis gave identical kinetic 
values as the starting material . 

It is formally possible that the residual activity in some or all 
^{hese. mutants, occurs at a non-specific site(5) distinct, from 
le .active site. The following points argue for catalysis at the 
Lc^ve site. (1) In some cases the kinetic effects are cumulative 
for mutagenesis at the active site. For example, the /c^t values 
decrease in the following order: S24C> S24C: D32A> 
f S24C : D32 A : H64A > S24C : D32 A : H64A : S22 1 A (Table I ). (2) 
The Kf^^ values are usually not more than twofold above the 
twild type value suggesting continued strong and specific binding 
r(assuming K^-^ K,), Furthermore, the active site mutants show 
similar pH dependent increase in as wild type subtilisin. 
[3) The substrate preferences for the S24C:D32A and 
[^S24C:S221A enzymes toward two other substrates essentially 
the wild type enzyme (P. C, unpublished results). The 
luBstra'te specificity of the S24C:H64A enzyme also parallels 
he wild type except for a strong preference for His P2 sub- 
^strates*^ (see below). (4) The aaivity of the S24C: H64A enzyme 
iia heat denaturable (C. Mitchinson, unpublished results) which 
(indicates that the native protein conformation is critical for 
fcatalysts. (5) The residual activity for even the least active mutant 
[ts still >10^ fold above the non-enzymatic rate. This catalytic 
[rate is in the range measured for *good* catalytic antibodies 
'aken together these data provide compelling evidence that the 
residual catalytic activities we have measured are not due to 
protease contamination, assay artefacts or non-specific catalysis 
iaway from the normal active site. 

We suggest that the residual activity in the triple mutant is 
[derived from remaining binding determinants which stabilize 




500 



1000 



[S] (fiM) 



Fig. -2 Initial rate of hydrolysis vq {^A^iq/M) versus the con- 
centration of the substrate A^-succinyl-L-Ala-L-Ala-L-Pro-i^Phe-p^ 
nitroanilide [S] in the absence (•) or presence (O) of 
S24C : D32A : H64A : S221 A subtilisin. The background hydrolysis 
rate (•) was subtracted directly from the rate in the presence of 
subtilisin to give the enzymatic rate (O). Experiments were pcrfor*. 
med in 100 mM Tris - HCl, pH 8.60, at 25 ±0.2 *C, as described in 
Table 1. Insert (■) shows an Eadie-Hofstee plot of the initial rate 

data. 



the transition state complex outside the catalytic triad. 'Iti fact, ' 
previous data show that when the hydrogen bond to Asnl55 in . 
the oxyanion binding site (Fig, 1) is disrupted by site-directed v 
mutagenesis, there is a 10^ to 10^ drop in fcc with little effect 1 
upon Km***^*, Additional hydrophobic interactions (Figi-3) >yith • 
the PI substrate side chain** and binding interactions with the 
P2 to P4 substrate residues^^*^^ are estimated to^contribute 
independently factors of 10 to 100 to Structural analysis^^ 
suggests there are additional hydrogen bonds \r\ the transition 
state complex between the NH of Ser221 and the oxyanion, and . 
between the NH of the PI substrate residue and the carbonyl 
of Serl25. Deriving the total catalytic contribution from the sum 
of these individual binding components may lead to overestitna- 
tion because of their possible interdependence. Nonetheless, 
our data indicate that some or all of these determinants arc 
important for stabilizing the tetrahedral transition state complex 
(contributing >10^ to k^i), and are not simply required for 
positioning the substrate for optimal nucleophilic attack by 
Scr221. 

From an evolutionary point of view, it is extremely unlikely 
that the catalytic triad arose in one step rather than involving 
active intermediates. This view is now apparently complicated 
by the fact that the residues in the catalytic triad function in an 
extremely synergistic manner. But, assuming that the present- 
day enzyme is a reasonable model of its ancestor,, there are. at 
least two possible mutagenic pathways that give progressive 
increases in catalytic rate by stepwise introduction of the 
residues in the triad. In the first pathway, installing Ser221 
followed by His64 and then Asp32 gives progressive increases 
of 8. 9 and 3x10^ in /ccai (Table 1). This progression is even 
more uniform under alkaline conditions, resulting in increases 
in of 50. 10 and 5x10^ (Table 2). A second mutagenic 
pathway is possible by preferential use of a His P2 substrate 
(Fig. 3)^° in place of the catalytic His64. We have previously 
shown that the Ala64 enzyme has a turnover number of 2x 
for hydrolysis of a His P2 substrate compared to 8x 
for an Ala P2 substrate*^. This catalytic advantage, 
which we have called 'substrate-assisted catalysis*, makes it 
feasible to reverse the order of introducing His64 and Asp32. 



10-2 S-* 
10-* S-* 



Best Available Copy 



568 



LfTTERSTONATURE- 



NATURE VOL. 332 7 APRIL 19^ 



OUllU 



Fig. 3 • Siereovicw of a model conteining the 
substrate. N-succinyl-u-AIa-L-Ala-L-Pro-L- 
Phc-/>-nitroaniHde (bold lines and filled atoms), 
bound to the active site of B. amytoliquefaciens 
subttlisin. Alpha carbons from important 
enzyme and substrate residues are labelled. In 
protease substrate nomenclature the substrate 
may be represented as 

O H 

NH,-Pn"*Pl-C-N-Pr Pn'-COOH, 
where the scissile peptide bond is between the 
PI and PV residues^ . The E • S model is based 
^ . - upon a prelirainary 2.0 A X-ray structure of a 
product bound to subtilisin and the succinyl 
and p-nitroanilidc groups were introduced by 
modelling (R- Bott and M. Ultsch. unpublished 
data). This model is similar to a previously 
published complex . 

Of course this advantage would apply only to His P2 substrates 
but would be reasonable if the ancestral enzyme were involved 
in specific proteolytic processing, for example. Regardless of 
the exact order of evolutionary events, our mutagenic studies 
show that Inserting catalytic triad residues in a stepwise fashion 
can produce enzyme intermediates with progressively increased 

tumovcr.numbers. . . . _, ^ 

• In summary, when residues in the catalytic tnad are altered 
separately or together there arc large effects on turnover rate, 
consequent changes in the enzyme mechanism, and only minor 
effecU dh, the Michaelis constant. The residues m the catalytic 
triad function in a strongly synergistic fashion and contribute 
a factor of about 2 x 10* to the total to the catalytic rate enhance- 

Received 19 January 1988; accepted 22 February 1988. 
I SuovA, R. M. Sdfni. Am. 131, 74-88 (1974). 

2. Kraut, J. A Rtv. Biochem. 46, 331-358 (1977). ^ i ,77 (Rov Soc 

3. Ftnk. A. L. in Enzyme Mechanisms (eds Page. M. I. 4 Williams, A.) 159-177 (Roy. :»oc 

4 KwHuofl.'" A. in Bioto^ical Macumolecuie, and AssembUei Vol. 3 (eds iumak. F. A. A 

MtPheiwn. A.) 370-412 (1987). 
5, Zoller. M. J. & Smith, M. Nudeic Actds Res. "•'ff • 

7911-7925 (1983). 

t ttTo^rtl\T'^^^^^^ T. p. * BUell. O. A. ... A.. 

10. Wel;^. ^i^^n^V^:^^^^^^^^ T. P. . B^Cl. O, . ^ - U... 

!}• SrX/->^*~ ^^^^^^^ . W A 317 

: ^ 415r«23 (1986). 




OUtlM 




-ijati 



ment of 10' to 10*°. The residual activity from complete replace-^ 
mcnt of the catalytic triad is not a contaminant or other artefa<^^ 
but results from transition state stabilization from contacts otit^ 
side the catalytic triad. Finally, despite the synergy between th 
catalytic triad residues, their sequential introduction is r^^fti^ 
able in terms of both evolution and function. 

We thank Dr Rick Bott for help in preparing Fig. 3 and shanng: 
unpublished X-ray coordinates, Dr Polly Moore and Ann-Ben 
ningcr for assistance in data handling, the organic chemistrj, 
group at Genentech for synthesis of oligonucleotides, and Dra 
Tony Kossiakoff. Jack Kirsch and Ron Wetzel for helpful com- 
ments on this manuscript. 



15. Gutfrcund. H. & Sturtevant, J. M. Biochem. J. 63. 656-661 (1956). _ 
16 Craik. C. S.. Roczniak. S.. Larjman. C. & Rutter. W. J. Science 237. 909-913 (1987), 

17. Sprans, S. ei aL Science 137, 905-909 ( 1987) iggAi 

18. T^monlano. A.. J.nda. K. D. A ixmer R A ^^66-1570 09W 

19. Pollack, S. J., Jacob.. J. W. A Schultz. P. G. Sc.«icr 234. »5TO-I573 0986)_ • 

20. Napper. A. D., Benkbvic. S. J.. Tramoniaoo. A. A Umer. R. A. Saence 237. 1041-1043, 

21. Briln! P.' Pantoliano. M. W.. Quill. S. G.. Hsiao. H.-Y. A Pouios. T. natn. AcatL ScL 

U&A. 83. 3743-3745 (1986). ct< «wio7m 

22. Morihara. K.. Ok.. T. A T.uzuki. H. ArrJ. BJocW ^'^^P^y''^^^^^ 

23. Mbriham. K.. Oka. T. A T.uaukl. H. Btochem^ bi^ys, f,**.*^""^^! n ^Ji^iira 5 972^ 

24. Roberto,. J. D.. Krtut, J.. Alden. R. A. A EJirktoO. J. J. Btcch'm,siry II. 4293-4303 (1972^^ 

25. Well*. J. A.. Va»er. M. A Powers, Df B. 34, 315-323 1985). 

26. S«Tger. F.. Nicklen. S. A Coulion, A. R. ^^J^^'^^f- ^^mom^ ' ^463-5467 (1977) 
27 Yar^ M. Y.. Ferrari, E. A Henner. D. J. / Bacu 160, 15-21 (1984). 

28. O^M»:io\ Langman. C, Brodrick. J.W. A Goeka,. M.C AnalyL BpHem. 99, 316-3 

29. M^uubin^ H.. ICasper, C. B., Brown. D. M. A Smith, a L. /. biol Chem. 240, II25-|I 

30 ScheStir. I. A Berger. A Btochem. biophyi. Rei. Commun. 27, 157-162 (1967). 
3 1 : Roberttis; Ji D. ' el aL Biochemistry 1 1, 2439-2449 ( 1972). , ; • l v 





Exhibit 9 



The Journal of Biological Chemistry 

<9 1990 by The American Society for Biochemistry and Molecular Biology, Iik. 



Vol. 265, No. 13, Issue of May 5. pp. 718&-71B7. 1990 

Printed in U.S.A. 



Site-directed Mutagenesis Suggests Close Functional Relationship 
between a Human Rhinovirus 3C Cysteine Protease and Cellular 
Trypsin-like Serine Proteases* 

(Received for publication, November 13, 1989) 

Keat-Chye Cheaht, Louis E.-C. Leong, and Alan G. Porter§ 

From the Institute of Molecular and Cell Biology, National University of Singapore, Kent Ridge Crescent, Singapore 0511 



Human rhinoviruses, like other picornaviruses, en- 
code a cysteine protease (designated 3C) which cleaves 
mainly at viral Gln-Gly pairs. There are significant 
areas of homology between picornavirus 3C cysteine 
proteases and cellular serine proteases (e.ff. trypsin), 
suggesting a functional relationship between their cat- 
alytic regions. To test this functional relationship, we 
made single substitutions in human rhinovirus type 14 
protease 3C at seven amino acid positions which are 
highly conserved in the 3C proteases of animal picor- 
naviruses. Substitutions at either His-40, Asp-85, or 
Cys-146, equivalent to the trypsin catalytic triad His- 
57. Asp-102, and Ser-195, respectively, completely 
abolished 3C proteolytic activity. Single substitutions 
were also made at either Thr-141, Gly-158, His-160, 
or GIy-162, which are equivalent to the trypsin speci- 
ficity pocket region. Only the mutant with a conserv- 
ative Thr-14 1 to Ser substitution exhibited proteolytic 
activity, which was much reduced compared with the 
parent. These results, together with immunoprecipi- 
tation data which indicate that Asp-85, Thr-141, and 
Cys-146 lie in accessible surface regions, suggest that 
the catalytic mechanism of picornavirus 3C cysteine 
proteases is closely related to that of cellular trypsin- 
like serine proteases. 



Human rhinoviruses (HRVs),^ the main causative agents 
of the common cold, form one genus of the Picornavirus 
family (Stott and Killington, 1972; Gwaltney. 1975). The 
primary translation product of the positive stranded RNA 
genome of picornaviruses (e.g. HRVs, poliovirus, and foot- 
and-mouth disease virus) is a single precursor polypeptide 
which is rapidly processed by viral proteases to mature prod- 
ucts (Nicklin et ai, 1986; Krausslich and Wimmer, 1988). 
Proteolytic cleavage of the viral precursor protein plays an 
important part in the regulation of picornavirus replication. 
Two Tyr-Gly pairs in the precursor are cleaved by viral 
protease 2A (Krausslich and Wiraraer, 1988). Most of the 
cleavages are performed by viral protease 3C (30*"*) which 

* The costs of publication of this article were defrayed in part by 
the payment of page charges. This article must therefore be hereby 
marked **aduertisernent'' in accordance with 18 U.S.C. Section 1734 
solely to indicate this fact. 

§ To whom all correspondence should be sent. 

X Present address: Dept. of Microbiology and Immunology, The 
University of Adelaide, Box 498, GPO, Adelaide, South Australia 
5001, Australia. 

' The abbreviations used are: HRVs, human rhinoviruses; HRV- 
14, human rhinovirus type 14; 3C^™, viral protease 3C; SDS-PAGE, 
sodium dodecyl sulfate-polyacrylamide gel electrophoresis; KLH, key- 
hole limpet hemocyanin; Ap**, ampicillin resistant; PBS, phosphate- 
buffered saline. 



exhibits a preference for Gln-Gly pairs (Nicklin et al, 1986; 
Krausslich and Wimmer, 1988). 

SC**'" from poliovirus (Hanecak et al, 1984; Ivanoff et al., 
1986; Richards et aL, 1987; Nicklin et al., 1988), encephalo- 
myocarditis virus (Parks et aL, 1989). foot-and-mouth disease 
virus (Klump et o/., 1984; Strebel et ai, 1986) and HRV-14 
(Cheah et ai, 1988; Libby et al, 1988) have been cloned and 
expressed in Escherichia coli. In most of these studies, the 
3(^pn> precursor form has been shown to cleave its flanking 
Gln-Gly sites to release mature 30^"* in an autocatalytic 
fashion. However, cleavage at Gln-Gly to release the polio- 
virus capsid proteins is performed not by 30^"* but by the 30- 
3D precursor in which SC^™ is covalently fused to the adjacent 
3D polymerase (Jore et al., 1988; Ypma-Wong et al., 1988). 

3(7~ activity is inhibited by cysteine protease inhibitors, 
indicating that cysteine may be an active-site amino acid 
(Korant, 1973; Pelham, 1978; Korant et at,, 1985). In fact, 
sequence comparisons of 3C proteases from animal picorna- 
viruses and 3C-like proteases from some plant viruses showed 
that only one of the cysteines (Cys-147 in poliovirus) is highly 
conserved in all these viruses (Argos et ai, 1984; Franssen et 
al, 1984). Strong evidence that Cys-147 of poliovirus is an 
active-site amino acid came from site-directed mutagenesis 
studies which demonstrated that mutation of the highly con- 
served Cys-147 to Ser resulted in the inactivation of the 
protease, whereas similar mutation of the nonconserved Cys- 
153 had no effect (Ivanoff et at., 1986). 

It was suggested on the basis of computer alignments that 
the viral 3C cysteine proteases may represent an evolutionary 
link between the cellular cysteine proteases exemplified by 
papain, and the cellular trypsin-like serine proteases (Gor- 
balenya et ai, 1986). More extensive computer alignment of 
picornavirus 3C proteases and cellular serine proteases re- 
vealed some remarkable primary and secondary structural 
homologies, indicating that certain amino acids within 3Cp"*, 
including Cys-147 (Cys-146 in HRV-14), may be responsible 
for catalysis or substrate binding in a mechanistically similar 
fashion to the cellular serine proteases (Bazan and Fletterick, 
1988). His-40, Asp-85, and Cys-146 of HRV-14 30^"*. which 
are completely conserved in all picornaviruses align with His- 
57, Asp-102, and Ser-195 of the trypsin-like serine protease 
catalytic triad (Bazan and Fletterick, 1988). As a result of 
these alignments, Thr-141, Gly-158, and His-160 of HRV-14 
3Qpro ^hich are also completely conserved in all picornavi- 
ruses, and Gly-162 which is conserved in HRVs and entero- 
viruses [e.g, poliovirus), align with the amino acids lying in 
or close to the specificity pocket of the cellular serine pro- 
teases (Bazan and Fletterick, 1988). In this paper, we describe 
introduction of single amino acid substitutions in HRV-14 
3CP" at the positions which correspond to the trypsin catalytic 
triad and specificity pocket. All except one of the substitutions 



7180 



Mutational Analysis of a Picornauirus 3C Protease 



7181 



destroyed the proteolytic activity of SC"*. In addition, mon- 
ospecific peptide antisera raised against some of the regions 
in 30^"* corresponding to the trypsin catalytic triad and spec- 
ificity pocket, efficiently immunoprecipitated 30"°. Our re- 
sults suggest that the picoma viral 3C cysteine proteases and 
cellular serine proteases may catalyze peptide bond cleavage 
utilizing basically similar mechanisms. 

MATERIALS AND METHODS 

Oligonucleotides and Peptides — Oligonucleotides 1 to 3 (Table I) 
and the sequencing primer 5' GCGTGTTGACTGGATTT 3' (HRV- 
14 nucleotides 5823-5839; Stan way et a/., 1984) were synthesized 
using a Pharmacia Gene Assembler. Oligonucleotides 4 to 9 (Table I) 
were purchased from Promega. Peptide 1 {CGGGTLDRNEKFRDIR, 
Fig. 1) and peptide 2 (RYDYATKTGQC, Fig. 1) were purchased from 
Diagnostic Biotechnology (Singapore) and Cambridge Research Bio- 
chemicals (United Kingdom), respectively. 

Preparation and Characterization of Peptide Antisera — A non- 
natural cysteine and three glycine spacers were added to the amino 
terminus of the core peptide 1 sequence (TLDRNEKFRDIR) to 
facilitate coupling of the peptide to the carrier protein keyhole limpet 
hemocyanin (KLH) (Sigma). No additional amino acids were intro- 
duced into peptide 2 (RYDYATKTGQC) which already has a cysteine 
at the carboxyl end. 2.5 mg each synthetic peptide was coupled to 
KLH via cysteine using yV-maleimidobenzyl-TV-hydroxysuccinimide 
ester (Pierce Chemical Co.) (Nivison and Hanson, 1987). 

To induce antl-peptide antibodies, two rabbits were subcutaneously 
inoculated with 100 fig of each of the KLH -coupled peptides mixed 
with an equal volume of Freund's complete adjuvant. Subsequent 
injections were carried out with the same amount of coupled peptides 
emulsified in Freund's incomplete adjuvant at monthly intervals. 
Sera were prepared from blood collected 2 weeks after each booster 
and kept at -70 *C. 

For dot blot analysis, serially diluted peptides and KLH were 
spotted onto nitrocellulose membranes (0.45 ^M, Sartorius) and dried. 
The membranes were incubated with 5% skim milk in phosphate- 
buffered saline containing 0.05% Tween 20 (PBS-T) at 22 'C for 2 h. 
The blocked membranes were then incubated with the test sera 
diluted in PBS-T at 22 'C for 16 h. The membranes were washed 
three times with PBS-T and incubated with biotinylated goat anti- 
rabbit IgG (Bethesda Research Laboratories) at 22 *C for 1 h, then 
washed again three times. The membranes were treated with Strep- 
tavidin-horseradish peroxidase conjugate (Bethesda Research Labo- 
ratories) at 22 'C for 1 h, washed as before, and incubated with 0.33% 
4-chloro-naphthol in methanol and 0.018% hydrogen peroxide in 
PBS. 

Maxicetl Labeling and Protein Analysis — Polypeptides expressed 
by plasmids in E. coli maxicell strain CSR603 (Sancar et ai, 1979) 
were labeled with [^S] methionine (>1200 Ci/mmol, Amersham 
Corp.) according to Cheah et at, (1988), except that the cell pellet was 
resuspended in lysis buffer containing 50 mM Tris-HCl, pH 7.5, 30 
mM NaCl. and 200 /xg/ml lysozyme. Cell lysis was achieved by three 
rapid freeze-thaw cycles. The lysed cells were centrifuged for 20 min 



at 4 *C and the supernatant (soluble fraction) was saved. The pellet 
(insoluble fraction) was resuspended in lysis buffer. 5 ;il of the soluble 
and resuspended insoluble fractions were mixed with an equal volume 
of loading buffer (25 mM Tris-HCl, pH 6.8. 3% SDS, 7.5% ^-mercap- 
toethanol, 25% glycerol, and 0.05% bromophenol blue), boiled for 10 
min, subjected to SDS-PAGE, and autoradiographed (Cheah et oL, 
1988). 

Immunoprecipitation — 25 mI of antiserum, diluted in 300 ti\ of 
immunoprecipitation buffer (50 raM Tris-HCl, pH 7.4, 150 mM NaCl, 
and 2% Triton X-100), were preabsorbed with KLH and unlabeled 
E. coli maxicell extract at 22 "C for 2 h. 20 /il of |*^S]methionine- 
labeled E. coli maxicell extract was then added to the preabsorbed 
antiserum and mixed at 4 *C for 17 h. 100 fil of protein A-Sepharose 
CL-4B (Pharmacia LKB Biotechnology Inc.) was added, mixed for a 
further I h, and centrifuged. The pellet was washed three times with 
immunoprecipitation buffer and 10 mM Tris-HCl, pH 7.5, resus- 
pended in 50 /il of loading buffer, boiled for 10 min, and analyzed by 
SDS-PAGE. 

For the analysis of gel -purified polypeptides, l^S]methionine-la- 
beled polypeptides were separated by SDS-PAGE (Cheah et at, 1988). 
The gel was rinsed with NT buffer (25 mM Tris-HCl, pH 7.4, and 25 
mM NaCl), immediately dried, and autoradiographed. The areas of 
the gel corresponding to the 30^"* precursor and the 20-kDa SC**"* 
were cut out and soaked in NT buffer at 4 'C for 17 h. The superna- 
tant, containing diffused proteins, was immunoprecipitated as de- 
scribed above and analyzed by SDS-PAGE. 

Site-directed Mutagenesis and DNA Sequencing — The mutagenesis 
protocol was essentially as described by Kunket et at (1987) using 
the Muta-gene^ M13 m uitro mutagenesis kit (Bio-Rad). First a M13 
recombinant was constructed, consisting of the entire plasmid 
pKCCllO (Cheah et ai, 1988) subcloned in the Pstl site of bacterio- 
phage Mis mpl9 to give pLCl77. To prevent deletion of the insert, 
a plaque picked directly from the transformation was grown for 6 h 
in 6 ml 2 X TY medium, and the single-stranded DNA purified as 
follows: 5 ml culture supernatant from a 10-min centrifugation was 
mixed with 0.65 ml of 20% polyethylene glycol 6000 and 2.5 M NaCl. 
After 15 min at 22 *C, the phage was collected by centrifugation (10 
min) and the pellet dissolved in 250 /il of 20 mM Tris-HCl, pH 8.0, 1 
mM EDTA. DNA was isolated by two phenol extractions and one 
chloroform extraction, then precipitated with ethanol. 

The template DNA for mutagenesis, uracil-enriched pLCl77 sin- 
gle-stranded DNA, was obtained by retransforming the recombinant 
single-stranded phage DNA (pLCl77) into the Dut" Ung~ E. coli 
strain CJ236 (Kunkel et ai, 1987), and purifying the single -stranded 
DNA as above. 

The annealing of the mismatching oligonucleotides (Table I) to 
the template DNA and polymerization with T4 DNA polymerase in 
the presence of T4 gene 32 protein were performed essentially ac- 
cording to the manufacturer's instructions (Bio-Rad Muta-gene® kit), 
except that the polymerization reaction was incubated at 25 'C for 
18 h following the recommended incubations at 4, 25, and 37 "C. The 
resultant closed, circular DNA was transformed into the Ung*^ E. coli 
strain MV1190 and four independent plaques from each mutagenesis 
mixture were screened for the correct mutation by dideoxy sequencing 



Table I 

Mutations generated by site-directed mutagenesis 





Sequence of mutagenic oligonucleotide 5'— »3' 


Location of oligo- 
nucleotides on 
HRV-14 cDNA- 


Amino acid substitution* 


Predicted role of 
amino acid' 


1. 


CACCTCCAGACTGCCCAG 


5663-5680 


Cys-146-^Ser (pAC304) 


Catalysis 


2. 


CACAGCACACCTCCCATCTGCCCAGTTTTTG 


5657-5687 


Cys-146-^Met (pAC305) 


Catalysis 


3. 


CACAGCACACCTCCAGTCTGCCCAGTTTTTG 


6667-5687 


Cy3-146-^Thr CpAC306) 


Catalysis 


4. 


GCTGTGCGTCTGTGGGTATC 


5343-5362 


His-40->Asp (pAC307) 


Catalysis 


5. 


CCCTGATAGCTCTGAATTTTTC 


6476-6497 


A8p-85-^Ala <pAC308) 


Catalysis 


6. 


CCCAGTTTTTGATGCATAATCATAAC 


5642-5667 


Thr-141-^Ser (pAC309) 


Base of specificity 










pocket 


7. 


CAACATGAATATCAAAGATCTTAC 


5696-5719 


Gly-158-»Asp (pAC3l0) 


Highly conserved 


8. CGCCAACATTAATACCAAAGATC 


5700-5722 


Hi8-l60-^A8n (pAC311) 


Side of specificity 










pocket 


9. 


CTTCCATTACCGTCAACATGAATAC 


5708-5732 


Gly-162-^Asp (pAC312) 


Top of specificity 










pocket 



" Nucleotide number shown is based on the published HRV-14 sequence (Stanway et ai, 1984). 

Plasmid names are shown in parenthesis (see text for details). 
' According to the alignment with trypsin (Bazan and Fletterick, 1988). 



7182 



Mutational Analysis of a Picornauirus 3C Protease 



(Sanger e< ai, 1977) using the primer 5' GCGTGTTGACTGGATTT 
3'. 

To regenerate plasmids equivalent to the parental plasmid 
pKCCllO, the mutant derivatives of pLCl77 were digested with Pstl 
(Amershara Corp.), and the linear DNA was allowed to self-Ugate. 
The DNA was transformed into E. coli strain MC1022 and ampicillin- 
resistant (Ap*^) transformants were selected (Maniatis et ai, 1982). 
Finally, the mutant plasraid DNAs were retransformed in E. coli 
CSR603 maxicelLs for analysis of plasmid-encoded proteins (see 
above). 

RESULTS 

Immunoprecipitation of 30"^ and Its —SS-kDa Precursor — 
The predicted HRV-14 3C"* amino acid sequence (Stanway 
et aif 1984) was analyzed for short peptide regions with a 
good potential for inducing antibodies that would recognize 
surface epitopes in SC" (Garnier et ai, 1978; Lerner, 1984). 
The analysis predicted that amino acids 76 to 87 and 136 to 
146 (peptides 1 and 2, respectively, Fig, 1) lie in hydrophilic 
turn regions in the protein, which is in agreement with Werner 
et al (1986). These peptides were therefore chosen for raising 
antisera. Two rabbits were independently immunized with 
each peptide coupled with KLH. Sera from each pair of rabbits 
reacted with the homologous peptide in a dot blot assay, and 
no cross-reactivity was detected with the heterologous pep- 
tides. Preimmune sera from all four rabbits gave no reaction 
with either peptide (not shown). 

We have previously reported the construction of a HRV-14 
expression plasraid pKCCllO which codes for 30**™ plus some 
flanking viral sequences. In E. coli maxicells, pKCCllO en- 
codes a unique precursor polypeptide of '-SS-kDa, which was 
suggested on the basis of its size to comprise the carboxyl- 
terminal portion of the viral RNA-linked protein VPg (3B), 
the entire 30^™ and the amino-terminal half of the viral 
polymerase 3D (^) (Fig. 1; Cheah et al,, 1988). The -55-kDa 
3Qpro precursor is rapidly processed to several polypeptides, 
including 30*"^ migrating at --20 kDa (Cheah et al, 1988). 

Fig. 2A shows that in extracts of [^SJmethionine-labeled 
E, coli maxicells harboring pKCCllO, 30^" and the -56- kDa 
30"° precursor are more abundant in the insoluble pellet than 



in the lysozyme (soluble) extract (Fig. 2A, compares lanes 2 
and 5). A background protein comigrating with the — 55-kDa 
band is occasionally detected in the soluble fraction of maxi- 
cells carrying the vector pKCClOO (Fig. 2A, lane 4). 

Immunoprecipitation experiments using the soluble frac- 
tion (lysozyme supernatant; Fig. 2A, lane 3] demonstrated 
that peptide 1 and 2 antisera specifically recognize the 20- 
kDa SC^"* polypeptide (Fig. 2B, lanes 2 and 5), whereas the 
preimmune sera did not (Fig. 2B, lanes 3 and 6). The —55- 
kDa 30**™ precursor from the soluble fraction of E. coli was 
not inununoprecipitated by either peptide antisera (Fig. 2B, 
lanes 2 and 5). 

To circumvent the lack of immunoprecipitation of the —55- 
kDa 3C*"** precursor protein, the [^S]methionine-labeled pro- 
teins encoded by pKCCllO in E. coli maxicells were separated 
by SDS-PAGE, and the gel was immediately dried and auto- 
radiographed without fixing the proteins. The regions corre- 
sponding to the -55-kDa 30**"* precursor and SC"' (Fig, 2/1. 
lane 1) were excised from the dried gel and eluted by diffusion 
at 4 "C. The eluted proteins were either rerun on a second 
SDS-polyacrylamide gel (Fig. 2C, lanes 1 and 6) or incubated 
with peptide 1 and 2 antisera and inmiunoprecipitated. Both 
peptide antisera immunoprecipitated the —55- kDa 30**™ pre- 
cursor (Fig. 2C, Uxnes 2 and 3) and 3C**'** (Fig. 2C, lanes 7 and 
5), whereas preimmune sera did not (Fig. 2C, lanes 4, 5, 9, 
and W), Further, the immunoprecipitation of the gel-purified 
SC^"* precursor by both peptide antisera was inhibited by prior 
absorption of the peptide antisera with 10 >tg of the homolo- 
gous peptide (not shown). 

Taken together, the immunoprecipitation experiments con- 
firmed our previous assignment of the —55- and — 20-kDa 
polypeptides as SC™ precursor and 3C**"*, respectively (Cheah 
et a/., 1988) and clearly indicate that amino acids 76 to 87 and 
136 to 146 are surface epitopes of SC"™ (Fig. 1). 

Construction of 30^ Mutants — Computer alignments of 
animal picomavirus 3C proteases and cellular serine proteases 
have indicated a limited number of significant homologies. 
The presumed active-site Cys-147 of poliovinis 3C^", equiv- 



» I I iMIil 

ATG O I Q HO 5xO 5 TAA 

I I \ — I I — I I ^ ) 




1 



P1 P2 I t 

I ) 
I 1 



SB . 30 ( 20 kd ) 3D ( 31 kd ) 

(2.2 kd ) I I 

i 3C-3D ( 52,8 kd ) 



I 



• I 

3B-3C-3D ( 55 kd ) 

Fig. 1. Schematic diagram showing the HRV-14 portion of recombinant plasmid pKCCllO. The 

heavy blackened line represents the cDNA of HRV-14 cloned in the trp promoter expression vector pKCClOO, and 
the hatched box depicts the 19 amino acids derived from vector sequences fused in frame to the HRV-14 open 
reading frame (Cheah et ai, 1988). The proposed Gln/Gly cleavage sites flanking SC*"" are shown as Q/C{2) and 
Q(182)/G (Stanway et ai, 1984; Cheah et ai, 1988). Peptide sequences chosen for raising antibodies, shown as open 
boxes, are Pi (peptide 1, amino acids 76 to 87 with an amino-terminal extension of Cys-Gly-Gly-Gly) and P2 
(peptide 2, amino acids 136 to 146). The full sequences of the peptides are given under "Materials and Methods.** 
The locations of the amino acids substituted by site-directed mutagenesis are shown in sin gle letter code (see text 
and Table I for details). The viral proteins and their precursors {3B, 3C, 3D, 3C-3D, and 3B-3C-3D) are shown 
with the estimated sizes in parentheses (Stanway et ai, 1984; Cheah et at^ 1988). Truncated proteins are indicated 
by overlining {e.g. WD). 



Mutational Analysis of a Picornavirus 3C Protease 7183 

A B 



1234 123456 




c 



12345 6789 10 

mm » ^ W "^"^ 

3C prMursor 3C 



FiC. 2. Protein analysis. A, autoradiograph of a 12.6% SDS-polyacrylamide gel showing (**SJmethionine- 
labeled HRV-14 polypeptides synthesized in E, coii CSR603 maxxcells. Lane J, pKCCllO (whole lysate); tone 2, 
pKCCllO <solubilized pellet fraction); lane 3, pKCCllO (soluble fraction extracted with lysozyme); Ume 4, vector 
pKCClOO without insert (soluble fraction extracted with lysozyme). Unique polypeptides encoded by recombinant 
plasmid pKCCllO are indicated on the left (Fig. 1; Cheah et oL, 1988). Bla is /^-lactamase. B. immunoprecipitation 
of protease 3C by peptide antisera. [**SlMethionine.labe!ed soluble proteins encoded by pKCCUO were either 
loaded directly on the SDS-polyacrylamide gel {lanes 1 and 4), immunoprecipitated with peptide 1 antiserum {lane 
2), or immunoprecipitated with peptide 2 antiserum (lane 5). Lanes 3 and 6 are identical to lanes 2 and 5, 
respectively, except thai preimmune sera were used. The arrowheads on the right of panels A and B indicate the 
positions of size standards from top to bottom of sizes 68, 43. 25.7. and 18.4 kDa. C. immunoprecipitation of SDS- 
polyacrylamide gel-purified SC**™ precursor {left panel) and 3C**^ {right panel). The regions in the gel (Fig. 2A. lane 
1) corresponding to the 3C^ precursor and 30*"* were excised, and the proteins were eluted and analyzed on a 
12.5% SDS-polyacrylamide gel. Lanes 7 and 6. proteins loaded directly, lanes 2 and 7, immunoprecipitation with 
peptide 1 antiserum; lanes 3 and S. immunoprecipitation with peptide 2 antiserum; lanes 4, 5, 9, and /O, 
immunoprecipitation with preimmune sera. 



HRV-14 3C 
TRYPSIN 



pro 




158 160 162 

A & A 

211 213 215 



FiC. 3. Proposed alignment of catalytic and speciHcity pocket amino acids of trypsin and.HRy-14 

30"". Computer alignment of the catalytic triad (A) and specificity pocket (A) amino acids of trypsin with the 
corresponding residues of HRV-14 30**^ is shown (Bazan and Fletterick. 1988). Amino acids in HRV-14 30**" 
substituted by site-directed muUgenesis (Fig. 1, Table 1) are shown in bold type. Based on our results, T-141 and 
not A- 140 of 3C»~ may be equivalent to D-189 of trypsin (see "Discussion"). Identical ammo acids are boxed. 



alent to Cys-146 in HRV-U 30""*, is highly conserved in all 
animal picornaviruses and lies in an area of significant ho- 
mology with the active-site Ser-195 of trypsin-like serine 
proteases (Gorbalenya et al., 1986; Bazan and Fletterick, 
1988). In addition, His-40 and Asp-85 of HRV-14 (Stanway 
et ol, 1984) are highly conserved in animal picornaviruses 
and cellular serine proteases. His-40, Asp-85, and Cys-146 of 
HRV-14 can be superimposed on the trypsin serine protease 
catalytic triad, His-57. Asp-102, and Ser-195 (Fig. 3; Kraut, 
1977; Craik et aL, 1987; Sprang et at, 1987). Therefore, 
substitutions were made individually at His-40 and Asp-85, 
and three different substitutions were made at Cy8-146 to test 
whether these amino acids are essential for the catalytic 
function of SCP" (Table 1, Fig. 1). 



The computer alignments also revealed that HRV-14 3C?** 
amino acids Thr-141. His-160, and Gly-162 lie in positions 
equivalent to serine protease amino acids known to be impor- 
tant for substrate binding and specificity (Fig. 3; Kraut, 1977; 
Bazan and Fletterick, 1988). In trypsin, the equivalent amino 
acids are serine, valine, and tryptophan, respectively (Fig. 3). 
Thr-141 and His-160 are highly conserved in picornaviruses, 
while Gly-162 is only partially conserved. Two lines of evi- 
dence suggest that these 3 residues are among those which 
are important determinants of Gln-Gly cleavage specificity. 
First, molecular modeling of His-160/Gly-162 in the pocket 
of a trypsin -inhibitor complex structure revealed possible 
hydrogen -bonding interactions between viral Thr-141/His- 
160 and the enzyme-bound side chain of the Gin substrate 



7184 



Mutational Analysis of a Picomauirus 3C Protease 



(designated Si position) (Kraut, 1977; Bazan and Fletterick, 
1988), Second, Staphylococcus aureus (strain V8) protease, 
which is a serine protease with a specificity for Glu in the Si 
pocket, has a Thr-141/His-160/Gly-162 complement of resi- 
dues (Drapeau, 1978; Bazan and Fletterick, 1988). Thus, 
changes were made individually at Thr-141, His- 160, and Gly- 
162 (summarized in Table I) to test whether these residues 
are essential for cleavage at Gln-Gly. In addition, Gly-158 was 
chosen for mutagenesis as an example of a very highly con- 
served residue occurring in the vicinity of the predicted spec- 
ificity pocket (Fig. 3, Table I). 

Single amino acid substitutions in HRV-14 30""* were gen- 
erated via site-directed mutagenesis (Kunkel et aLf 1987) 
using synthetic oligonucleotide primers (Table I). The single- 
stranded DN A template was prepared by subcloning the entire 
Pstl-linearized plasmid pKCCllO into the Pstl site of MIS 
mpl9 to give pLCl77 (Fig. 4). Following site-directed muta- 
genesis and DNA sequencing, the M13 mpl9 segment of 
pLCl77 derivatives bearing mutations in 30^"* was deleted by 
Pstl digestion, followed by self- ligation for the reconstruction 




I 

Site-Directed Mutagenesis and Sequence Analysis 

i 

Pst I Digestion of pLC177 Mutant Derivatives 



18«<f-»g»tlon R 
»«l«ctlon for Ap 




pAC series (Ap " ) 

Fig. 4. Scheme for site- directed mutagenesis. The recombi- 
nant plasmid coding for 3C^"* and flanking sequences ipKCCllO, 
blackened lines) and M13 mpl9 (double lines) were digested with Pstl 
and tigated together, yielding pLCl77. The open arrowheads denote 
the trp promoter and ribosome-binding site of pKCCllO (Cheah et 
ait 1988). Site-directed mutagenesis and sequencing of mutants are 
described in detail under "Materials and Methods." The mutant 
derivatives of pLCl77 were digested with Pstl, and the DNA was 
allowed to self-Ugate, generating the pAC series of mutant plasmids 
(Table I). The asterisk denotes a site-specific 3C*" mutant. 



of the Ap" gene and selection for Ap** transformants. This 
manipulation regenerated mutant plasmids equivalent to 
pKCCllO {pAC series. Fig. 4; Table I). 

Expression and Proteolytic Activity of Mutant 3C Pro- 
teases — The expression in E. coli of 3C proteases linked to 
the adjacent upstream and downstream viral flanking se- 
quences provides an immediate assay for the activity of the 
protease (Hanecak et a/., 1984; Klump et ai, 1984; Cheah et 
oL, 1988). The precursor form of HRV-14 30^" releases ma- 
ture SC^™ by autocatalytic proteolysis (Stanway et al, 1984; 
Cheah et ai, 1988; Figs. 1 and 2A ). It is most likely that HRV- 
14 3C^"* is released by proteolysis at its flanking Gln-Gly sites 
as found for poliovirus, since it has been shown that short 
synthetic peptides are efficiently cleaved at Gln-Gly by cloned 
HRV-14 3CP"* (Libby et aL, 1988). 

A comparison of the expression of parental and mutant 
HRV-14 3C**"* precursors in E, coli maxicells is presented in 
Fig. 5. In the case of the parental SO*"*, significant processing 
of the 3CP" (55 kDa) precursor to 3D (31 kDa) and 3C^"' (20 
kDa) was observed during the 1-h labeling period (Fig. 5, lanes 

2 and 6; Cheah et al, 1988), The doublet migrating at -46 
kDa probably consists of unrelated plasmid-encoded proteins 
since it is present in the vector control (Fig. 5. lane 1) and 
the yields are highly variable (see also Fig. 6). All nine mutant 
plasmids, each of which codes for a single amino acid substi- 
tution in SC™, expressed a precursor polypeptide of identical 
size (55 kDa), but migrating slightly slower than the SC^™ 
precursor encoded by the parent plasmid pKCCllO (Fig. 5, 
lanes 3-5 and 7-12). However, none of the mutant precursors, 
with the exception of the Thr-141 to Ser mutant, were cleaved 
to 3D and mature SC"*, demonstrating that their catalytic 
function had been destroyed. The fact that eight independent 
point mutations at six amino acid positions completely inhibit 
processing at two Gln-Gly sites makes it highly unlikely that 
E. coli proteases are involved in specific proteolysis of the 
parental 3C^™ precursor in the E. coli maxicell system. 

The Thr-141 to Ser mutation severely impairs processing, 
since very little 3D and 3C**"* were detected (Fig. 5, lane 12). 
The 3C*"^ (Ser- 141) precursor occurred as a doublet with 
bands of equal intensity, unlike the other mutants which only 
expressed the upper band (Fig. 5, compare lane 12 with Uines 
7-11), These observations provide an explanation for the 
parental 3CP'° precursor migrating slightly faster in SDS- 
polyacrylamide gels than the proteolytically inactive mutant 
3C**"* precursors (Fig. 5, e.g. compare lanes 2 and 6 with lanes 

3 and 7). With the parental SO'"' precursor (55 kDa), fast 
cleavage at the 3B/3C junction and slower cleavage at the 3C/ 
3D junction (Fig. 1) results in the accumulation of a 52,8-kDa 
3C-3D precursor (Fig. 5, lanes 2 and 6), In support of this 
explanation, a longer autoradiographic exposure of lanes 2 
and 6 of the gel shown in Fig. 5 revealed the presence of the 
authentic 55-kDa parental precursor comigrating with the 55- 
kDa precursor of the proteolytically inactive mutants (not 
shown). Therefore, the SC^"* precursor encoded by pKCCllO, 
previously designated "'-55 kDa," most probably consisted of 
the 62.8- kDa 3C-3D precursor and a small amount of 55-kDa 
3B-3C-3D (Fig. 1; Cheah et at., 1988). The longer exposure of 
the gel shown in Fig. 5 also did not reveal detectable 52.8- 
(3C-3D), 31- (3D) or 20-kDa (SC"*) bands with the proteolyt- 
ically inactive mutants, confirming that catalytic function of 
SC"* had been destroyed. 

Pulse-chase Analysis of Polypeptides Expressed by Mutant 
Plasmids — To examine whether the mutant 55-kDa precur- 
sors exhibit 3C*"** catalytic activity during prolonged incuba- 
tions, a series of pulse-chase experiments was performed. Fig. 
SA shows that following a 2-min [^SJmethionine pulse and a 



Mutational Analysis of a Picornavirus 3C Protease 

1 23456789 10 11 12 



7185 




Fig. 5. Polypeptides encoded by protease 3C mutant plasmids. The [^*S J methionine -labeled polypeptides 
in the whole extracts of £. co/i CSR603 harboring various recombinant plasmids (Table I) were separated by SDS- 
PAGE. Une I, the vector pKCClOO; lane 2. pKCCllO (parent); lane 3. pAC304 (Cys-146 to Ser); lane 4, pAC305 
(Cys-146 to Met); lane 5. pAC306 (Cys-146 to Thr); lane 6. pKCCllO (parent); lane 7, pAC307 (His-40 to Asp); 
lane 8, pAC3lO (Gly-158 to Asp); lane 9, pAC311 (His-160 to Asn); Ume 10, pAC3l2 (Cly-162 to Asp); lane )/. 
pAC308 (Asp-85 to Ala); tone 12, pAC309 (Thr- 141 to Ser). Arrows on the right show the positions of protein 
markers with sizes from top to bottom of 68, 43, 25.7, and 18.4 kDa, Indicated on the left are the pKCCllO-encoded 
viral polypeptides. 3B-3C-3D (55 kDa), ZC-W (5^8 kDa). 3D (31 kDa), and 3C (20 kDa) (Fig. 1). Bla is /?- 
lactamase. 



4-h chase with unlabeled methionine and chloramphenicol, 
nearly all the parental 30^"* precursor was processed to ^ 
and SC**"* (see also Fig. 5 of Cheah et al, 1988). In contrast, 
no processing of the 3B-3C-3D precursor to 3C-3D. and 
3C^"* was detected with the Asp-85 to Ala mutant, even during 
an 18-h chase period (Fig. 6B). An identical result was ob- 
tained with the His-40 to Asp, Cys-146 to Ser. Cys-146 to 
Met, Cys-146 to Thr, Gly-158 to Asp. His-160 to Asn, and 
Gly-162 to Asp mutants (not shown). With the Thr-141 to 
Ser mutant, the ^-3C-^/3C-?D doublet was processed dur- 
ing the chase period to 315 and a SC^" mutant polypeptide 
(Fig. 6C), albeit at a much slower rate than that of the parental 
3CP'** precursor (Fig. 6i4). These results strengthen our con- 
clusion that mutations at six amino acid positions totally 
inactivate SC", and mutation of Thr-141 to Ser severely 
impairs 3C proteolytic activity. 

DISCUSSION 

We have previously utilized the E, coli maxicell system to 
demonstrate expression and autocatalytic proteolysis of an 
HRV-14 3C**"* precursor (Cheah et at,, 1988). In the present 
study, the parental and mutant 30"° precursors were ex- 
pressed at comparable levels in E, coli maxicells, but the 
parental precursor migrated slightly faster in denaturing gels 
than the proteolytically inactive mutant precursors (Fig. 5). 
This is because cleavage of the parental 5B-3C-3D precursor 
is much faster at the 3B/3C junction than at the 3C/3D 
junction, resulting in the accumulation of a 3C-3D precursor 
of 62.8 kDa (Fig. 1). In other picornaviruses. cleavage at 3B/ 
3C has also been reported to be faster than cleavage at 3C/ 
3D (Strebel et al, 1986; Richards et al, 1987; Jore et al, 1988). 
In vivo, a slow cleavage at 3C/3D would control the release of 
mature 3C**™ and at the same time provide an adequate supply 
of 3C-3D, the active protease required for cleavage of the 
capsid protein precursors (Jore et al, 1988; Ypma-Wong et 
ai, 1988). 

The E. coli maxicell system has for the first time provided 
a sensitive, convenient, and rapid way of assaying the effects 
of single amino acid substitutions on the proteolytic activity 



of autocatalytic proteases. Seven amino acid positions in 
HRV-14 SC*"" were chosen for site-directed mutagenesis based 
on two considerations. First, amino acids at all seven positions 
are highly conserved in animal picornaviruses. Second, an 
alignment with trypsin predicted that certain 3C?*" residues 
may be involved either in catalysis or substrate binding and 
specificity (Fig. 3; Bazan and Fletterick, 1988). It has previ- 
ously been shown that the Cys-147 to Ser mutation inactivates 
poliovirus 3Cp"*, although it was not clear whether residual 
proteolytic activity remained (Ivanoff et cd., 1986). Here we 
show that if Cys-146 of HRV-14 3C^ (equivalent to poliovirus 
CJy8-147) was changed either to serine, methionine, or threo- 
nine, proteolytic activity was completely destroyed. Likewise, 
mutation of His-40 to Asp or Asp-85 to Ala, which are 
equivalent to His-57 and Asp- 102 in the catalytic triad of the 
trypsin-like serine proteases, completely destroyed 30"° ac- 
tivity. Two different antisera raised against peptides contain- 
ing 3CP"* amino acids 76 to 87 and 136 to 146 efficiently 
immunoprecip itated mature 3C**", strongly suggesting that 
Asp-85 and Cys-146 lie in accessible surface locations in 30*". 
Taken together, the site-directed mutagenesis and immimo- 
precipitation data suggest that catalysis by HRV-14 ZO"^ is 
performed by a surface triad of His-40, A8p-85, and Cys-146 
in a mechanistically similar fashion to the histidine, aspartic 
acid, and serine at the active-site of the trypsin-like serine 
proteases (Fig. 3; Kraut, 1977; Craik et at. 1987). 

A very recent independent alignment of viral cysteine and 
cellular serine proteases (Gorbalenya et al, 1989) is largely in 
agreement with the analysis of Bazan and Fletterick (1988), 
except that Glu-71 and not A8p-85 was suggested to represent 
the acidic amino acid in the catalytic triad of HRV-14 and 
most other picornavirus 3C proteases. Although a glutamic 
acid has never been found in the serine protease catalytic 
triad and some 3C proteases have Asp-71, the participation 
of position 71 in the catalytic triad of 3C cysteine proteases 
cannot be ruled out. 

Amino acids in viral 3C proteases predicted to be involved 
in determining Gln-Gly cleavage specificity include the HRV- 
14 residues Ala-UO, Thr-141. Gly-158. His-160. and Gly-162 



7186 



Mutational Analysis of a Picornavirus 3C Protease 



pKCCIIO 
Coi 2 3 418 hr 




Fig. 6. Kinetics of cleavage of parent and mutant protease 
3C precursors. Vira! polypeptides expressed in UV- irradiated E. 
coli maxicells were labeled with ['^'^Sj methionine for 2 mi n and chased 
for the times indicated at 37 'C in the presence of excess unlabeled 
methionine and chloramphenicol (Cheah et ai, 1988). Panel A, 
pKCCl 10 (parent);pone/ B, pAC308 (Aap-a5 to Ala); panel C, pAC309 
(Thr-141 10 Ser). Arrows show the positions of protein markers with 
sizes from top to bottom of 68, 43, 25.7, and 18.4 kDa- Indicated on 
the left are the viral po^Tjeptides, I5B-3C-3D (55 kDa), 3C-3I5 (52.8 
kDa), (31 kDa), and 3C (20 kDa) (Fig. 1). Bla is ^^-lactamase. 

(Fig. 3; Bazan and Fletterick, 1988; Gorbalenya et ai, 1989), 
Ala-140 in HRV-14 3C**"* aligns with Asp-189 of trypsin, an 
important determinant of Arg/Lys cleavage specificity located 
at the base of the substrate binding pocket (Graf et ai, 1987). 
However, Ala-140 is unlikely to be directly involved in 3C**~ 
specificity, since other picomaviruses have the functionally 
dissimilar residues Gin, Asn, Glu, or Pro in this position. We 



found that Gly-158 to Asp. His-160 to Asn. and GIy-162 to 
Asp substitutions abolished 30**^ activity, supporting the 
proposal that each of the amino acids in these positions plays 
a crucial role in cleavage specificity (Bazan and Fletterick, 
1988). Consistent with our results, the His- 161 of poliovirus 
3CP"' (equivalent to His-160 of HRV-14) was converted to a 
glycine and proteolytic activity was also lost (Ivanoff et aL, 

1986) . The Thr-141 to Ser mutation in HRV-14 30**'" mark- 
edly reduced its activity. Our immunoprecipitation data sug- 
gest that Thr-141 lies in an accessible surface region and. as 
discussed earlier, Thr-141 could form a hydrogen bond with 
the side chain of the Si -bound Gin substrate. In theory, Ser- 
141 could similarly form a hydrogen bond, but the interaction 
would be weaker, since serine has a shorter side chain than 
threonine. A weaker interaction might explain the impaired 
activity of the Ser-141 mutant. Based on these considerations, 
we speculate that Thr-141 and not Ala-140 of 30**'* is equiv- 
alent to the important Asp-189 of trypsin (Fig. 3; Graf et a/., 

1987) . 

It is remarkable that substitutions at six positions in SC"* 
completely destroyed proteolytic activity, and one additional 
substitution (Thr-141 to Ser) severely impaired activity. It 
could be argued that 30 proteases are highly sensitive to 
structural changes. Although we cannot exclude this possibil- 
ity, there are two considerations which argue against it. First, 
some substitutions in poliovirus 30^"* are without effect (Ivan- 
off et aL, 1986; Dewalt and Semler, 1987). Second, the 30 
proteases of two related HRV subtypes HRV-2 and HRV-14 
are less than 50% homologous, and structurally dissimilar 
amino acids align at many positions (Stan way et ctL, 1984; 
Skem et aL, 1985), 

We have demonstrated that seven amino acids which are 
highly conserved in the 30 proteases of animal picomaviruses 
are important for the proteolytic activity of HRV-14 30**"*. 
These amino acids align with catalytic or specificity pocket 
residues of trypsin, suggesting that the catalytic mechanism 
utilized by picornavirus 30 cysteine proteases is closely re- 
lated to that of the cellular trypsin-like serine proteases. This 
is interesting because trypsin and chymotrypsin are inactive 
as precursors, which is in sharp contrast to the viral 30 
proteases. Also, unlike the cellular serine proteases, the viral 
30 cysteine proteases are believed to cleave both in cis and in 
trans (Krausslich and Wimmer. 1988). The question of 
whether the mechanisms of cis and trans catalysis are differ- 
ent has not yet been addressed. 

If the 30 cysteine proteases and cellular serine proteases 
are structurally and functionally related, it may be possible 
to convert a viral 30 cysteine protease to a serine protease by 
substituting a limited set of amino acids to compensate for 
the Oys-146 to Ser change, which by itself inactivates 30^"*. 
Support for this concept comes from the observation men- 
tioned earlier that S. aureus (strain V8) protease is a serine 
protease which cleaves after Glu residues and has a Thr-141/ 
His-160/Gly-162 complement of amino acids in the substrate- 
binding pocket (Drapeau. 1978; Bazan and Fletterick, 1988). 
In addition, animal flaviviruses and pestiviruses code for 
30^"*-like serine proteases with Arg/Lys cleavage specificity 
and only limited homology with the trypsin class of serine 
proteases in and around the substrate-binding pocket (Bazan 
and Fletterick. 1989). 

In conclusion, our site-directed mutagenesis results com- 
bined with a knowledge of the physicochemical properties of 
purified 30 proteases together with x-ray crystal structure 
data, will lead to a better understanding of the catalytic 
mechanism utilized by this unusual class of proteases. 

Acknowledgments — We are grateful to Dr. Gerd Klock and Woon- 



Mutational Analysis of a Picornauirus 3C Protease 



7187 



Khiong Chan for critically reading the manuscript. Sabita Sankar for 
assistance in the preparation of peptide antisera, Mei-Yeng Kok for 
oligonucleotide synthesis, Ka-LiongLok for photography, and Azizah 
Mohd AH for typing the manuscript. 

REFERENCES 

Argos, P., Kamer. G., Nicklin. M. J. H., and Wimmer, E. (1984) 

Nucleic Acids Res. 12. 7251-7267 
Bazan, J. F.. and Fletterick, R. J. (1988) Proc. Natl. Acad. ScL U. S. 

A, 85. 7872-7876 

Bazan, J. F., and Fletterick. R. J. (1989) Virology 171. 637-639 
Cheah, K-C, Sankar, S., and Porter, A. G. (1988) Gene {AmsL) 69. 
265-274 

Craik, C. S.. Roczniak. S., Largman, C, and Rutter, W. J. (1987) 

Science 237, 909-913 
Dewalt, P. G., and Semler. B. L. (1987) J. Virol 61. 2162-2170 
Drapeau, G. R. (1978) J. Bacterial. 136. 607-613 
Franssen, H., Leunissen, J., Goldbach, R., Lomonossoff, G., and 

Zimmern, D. (1984) EMBOJ. 3. 855-861 
Garnier, J., Osguthorpe, D. J., and Robson, B. (1978) J. Mol. Biol. 

120. 97-120 

Gorbalenya, A. E., Blinov. V. M., and Donchenko, A. P. (1986) FEBS 

Lett. 194, 253-257 
Gorbalenya, A. E., Donchenko, A. P.. Blinov, V. M.. and Koonin, E. 

V. (1989) FEBS Lett. 243. 103-114 
Graf, L., Craik. C. S., Patthy, A,, Roczniak, S.. Fletterick, R. J., and 

Rutter, W. J. (1987) Biochemistry 26. 2616-2623 
Gwaltney, J. M. (1975) Yale J. Biol. Med. 48. 17-45 
Hanecak, R., Semler, B. L,, Ariga, H., Anderson, C. W., and Wimmer, 

E. (1984) Cell 37, 1063-1073 
Ivanoff, L. A., Towatari, T., Ray, J., Korant, B. D., and Petteway, S. 

R. (1986) Proc. Natl. Acad, Sci, U. S. A. 83. 5392-5396 
Jore, J,, De Geus, B., Jackson, R. J., Pouwels, P. H., and Enger-Valk, 

B. E. (1988) J. Gen. ViroL 69. 1627-1636 

Klump, W., Marquardt, C, and Hofschneider, P. H. (1984) Proc. 

Natl. Acad. Sci. U. S. A. 81, 3351-3355 
Korant, B. (1973) J. Virol. 12, 556-563 

Korant, B. D., Brzin» J., and Turk, V. (1985) Biochem, Biophys. Res. 
Commun. 127. 1072-1076 



KraussHch, H-G., and Wimmer, E. {\98&) A nnu. Rev. Biochem. 57, 
701-754 

Kraut, J. (1977) Annu. Rev, Biochem. 46, 331-358 

Kunkel. T. A.. Roberts, J, D., and Zabour, R. A. (1987) Methods 

EnzymoL 154. 367-382 
Lemer, R. A. (1984) Adv. Immunol. 36. 1-44 

Libby, R. T., Cosman, D., Cooney, M. K., Merriam, J. E., March, C. 

J., and Hopp. T. P. (1988) Biochemistry 27. 6262-6268 
Maniatis, T.. Fritsch, E. F., and Sambrook. J. (1982) Molecular 

Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, 

Cold Spring Harbor. NY 
Nicklin. M. J. H., Toyoda, H., Murray, M. G.. and Wimmer, E. (1986) 

Bio/Technology 4. 33-42 
Nicklin, M. J. H., Harris. K. S., Pallai, P. V., and Wimmer, E. (1988) 

J. ViroL 62,4586-4693 
Nivison, H. T., and Hanson, M. R. (1987) Plant Mol Biol Rep. 5, 

295-309 

Parks, G. D., Baker, J. C, and Palmenberg, A. C. (1989) J. Virol 63. 
1054-1058 

Pelham, H. R. B. (1978) Eur. J. Biochem. 85. 457-462 

Richards, O. C.» Ivanoff, L. A., Bienkowska-Szewczyk, K., Butt, B., 

Petteway, S. R., Rothstein, M. A., and Ehrenfeld.E. (1987) Virology 

161, 348-356 

Sancar, A., Hack, A. M., and Rupp. W. D. (1979) J. Bacterial 137. 
692-693 

Sanger, F., Nicklen, S.. and Coulson, A. R. (1977) Proc. Natl Acad. 

ScL U, S. A. 74. 5463-5467 
Skern, T., Sommergruber, W., Blaas, D.. Gruendler, P., Fraundorfer. 

F., Pieler, C, Fogy. I., and Kuechler. E. (1985) Nucleic Acids Res. 

13, 2111-2126 

Sprang, S.. Standing. T.. Fletterick, R. J.. Stround. R. M., Finer- 
Moore, J., Xuong, N-H., Hamlin. R., Rutter, W. J., and Craik, C. 
S. (1987) Science 237, 905-908 

Stanway, G., Hughes, P, J., Mountford, R. C. Minor, P. D., and 
Almond, J. W. (1984) Nucleic Acids Res. 12, 7859-7875 

Stott, E. J., and Killington. R. A. (1972) Annu. Rev. Microbiol 26. 
503-524 

Strebel, K., Beck, E., Strohmaier, K., and Schaller, H. (1986) J. Virol 
57, 983-991 

Werner, G., Rosenwirth, B., Bauer, E.. Seifert. J-M., Werner, F-J., 

and Besemer, J. (1986) J. Virol 57, 1084-1093 
Ypma-Wong, M. F.. Dewalt. P. G., Johnson, V. H., Lamb, J. G., and 

Semler, B. L. (1988) Virology 166. 265-270 





Exhibit 1 0 



The Catalytic Role of the Active Site Aspartic Acid in Serine Proteases 




® 



Charles S. Craik; Steven Roczniak; Corey Largman; Willizun J. Rutter 
Science, New Series, Vol. 237. No. 4817 (Aug. 21. 1987), 909-913. 
Stable URL: 

htlp://iinksJstor.org/sici?sici=0036-8075%2819870821%293%3A237%3A4817%3C909 
Science is currently published by American Association for the Advancemenl of Science. 



Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use. available at 
http://www.jstor.org/about/terms,html. JSTOR' s Terms and Conditions of Use provides, in part, that unless you 
have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and 
you may use content in the JSTOR archive only for your personal, non-commercial use. 

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at 
http://www.jstor.org/joumals/aaas.html. 

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or 
printed page of such transmission. 



JSTOR is an independent not-for-profit organization dedicated to creating and preserving a digital archive of 
scholarly journals. For more information regarding JSTOR, please contact support@jslor.org. 



http://www.jsior.org/ 
Thu Sep 9 18:08:55 2004 



squaxa with the computer program COR£LS {^0), 
The pcaitiona) parainctcn of indhridual aicnm were 
then refined subfcct to stereochemical restraints by 
using the subcell data (tf). The positions of missing 
side-chain atoms and those of the benzamidine and 
calcium were determined from the nibccU difiEcrence 
electron dcruiiy map computed from the refitKd 
model. A modd of the full cryrtallographic asym- 
metric unit tn the corrca PZtZlZi unit cell was then 
construacd by adding a rcpUcare of the trypsin 
molecule translated by 46 A along the * and 32 A 
aJung c. The fuU modd was refined in three stages. 
In each stage the model was refit to a difiEcrence 
Fourier map computed with the cocffidcnts 
(2f (** - Strong pcato in the cfcciron density 

in positions consistent with hydrogen bond contacts 
to the protein or other established solvent positions 
were inchidcd in the model as ordered strivcnt. Next, 
the positional and thermal parametcn of all atoms 
were refined by iterations of restrained crystallo- 
graphic least squares, with data in the resolution 
range 6 A ^ £ 2.3 A. Refinement was txap^cd 
when further cycles failed to reduce the cryuillo 
graphic R factor and when the mean shift in coordi- 
nate positions was less than 0.05 A. Refined coordi- 
nates were then used to compute phases for a new 
electron map to be used in the next stage of manual 
refitting. After the third srage (K &ctor = 0.18), 
examination of the electron density failed to reveal 
errors or ambiguity in main- or side-chain positions, 
although the side chains of six residues located at the 
surface of the itk^ccuIcs were disordered and ccwld 
not be defined. Up to this point, side-chain atoms 
for His", Asn'", or Ser"* had been excluded from 
the nrKtdcl. A difference electron density map 
(fob. ~ revealed stror^ and well-ordered den- 

sity for the Asn"" and Scr'*, but die His" residue 
appeared to be stadsticaUy disordered (Fig. 2, top) 
iU). 

10. J. L. Sussman, S. R. Holbrook, G. M. Church, S. H. 
Kim,^(X« CryaaOi^. A32, 311 (1976). 

11. The possibility that one or other of the pealu are 
artifaccual was tested by independent refinement of 
two alternative models: one with His" fit to the 
stronger, internal densiry and the second with Hi**' 
fit to the external density. In each model the His" 
atoms were assigned full occupancy and side-chain 
positions for Asn"" and Ser"^were included. Eadi 
model was subjected to restrained crystaUognphk 
Fefuicment by varying the dtcnnal and positional 
parameters of all atoms. Subsequently, a difference 
Fourier map (F^b. - ^ck) ww computed for cadi 
model with the use of the refined positiortal and 
thermal paramcrcrs for all of the atoms in the 
respective models. In both cases, residual electron 
density appeared at the alternative histiditK site. 
Again, the observed densiry peaks were contiguous 
with die Cp atom of His" and thus could noe be 
interpreted as ordered water molecules. The relative 
occupancy of the two histidinc posidons and the 
total occupancy of both positions relative to other 
hisiidine side chains was estimated by intention of 
difference electron density at all of the histidirK side- 
chain positions in one of the trypsin molecules in the 
asymmetric unit. The difference Fourier map 
(Fob. - F^uc) intention was computed 
from a model in which the side-chain atoms of all 
four histidJne residues (at sequence positions 40, 57, 
70, and 87) were removed from the coordinate set 
of one molecule. Integration was performed manual- 
ly by summing over all grid points within 2.0 A of 
histidinc atomic positions that had electron density 
at least one standard deviation greater than the 
background density. Aiter normalizadon the appar- 
ent relative integrated difference densities at the 
histidinc side-chain positions were: His^, 0.87; 
His". 0.60; His™, 0.79; and His", l.O. AD but 
His*' are well ordered, so the range in inicgraied 
densities reflects thermal motion and experimental 
error. The sum of the density over the two His" 
side- chain sites is lower than the mean density of the 
well-ordered histidinc side chains^but u consistent 
with the high B faaors of His atoms at both 

Eisitions. The relative occupancy of the alternative 
is" positions was estimated by integrating the 
difference density at the N& I and Cel atoms of the 
gauche conformer and the C&2 and Ne2 atoms of 
the uans conformer and by talcing the ratio of the 



integrated deruities for the two positioiu. The re- 
maining histidinc atoms were not included in the 
tntegrsiion because the resolution of the data set did 
not allow the densides of the two conformers to be 
resolved at chose posidoiu. 

Final refined positional and thermal parameters 
for both trans and gauche confbrmcn were deter- 
mined fay refining an atomic modd in which both 
conformers were simultaneously included. Side- 
chain atoms of the gauche conformer m^ctc assigned 
occupancies of 0.67 and atoms of the traiu isomer 
were assigned occupancies of 0.33 based on the 
estimate derived from the integration described 
above (i2). After three final cycles of refinement of 
all thermal and positional parametcn of both trypsin 
monomers in the asymmetric unit, the crystaUo- 
graphic R factor was 0. 161. 

12. A modified version of PROTIN (obtained from ). 
Smith) does not generate restraints between altcr- 
rute side-chain positions of a statistically disordered 
residue. This allows refinement of two conforma- 
tions of an amiiw acid simultaneously. 

13. W. Bode and P. Schwagcr, /. Mai, Bid. 98. 693 
(1975). 

14. R. Henderson, r^iii^ 54, 341 (1970). 

15. An upper estimate of the mean error in atomic 
position is 0.2 S A. It was obtaiitcd by an analysis of 
the variation of crystallographic R factor as a func- 
tion of resolution {16). 

16. V. Luzatti.jlf*B Cryitalh^. 6, 142 (1953). 

17. A. A. Kossiokoff" and S. A. Spencer, Biochemistry 20, 



SERINE PROTEASES FUNCTION IN 
many biological systems to hydrolyzc 
specific polypeptide bonds. Trypsin, a 
well- studied member of this femily, cata- 
lyzris the hydrolysis of peptide and ester 
substrates that contain lysyl or arginyl side 
chains. Serine proteases have the triad of 
residues Asp*°% His^\ and Scr'" at the 
active site (chymotrypsin numbering sys- 
tem). X-ray crystallographic studies reveal 
that these three residues arc in close proxim- 
ity, which suggests they may serve as a 
^cdonal interacting unit responsible for 
bond formation and cleavage during cataly- 
sis (i). Numerous chemical and physical 



6462 (1981). 

18. M. Krieger tr «r.. iWi 15, 3458 (1976). 

19. M. N- G. James, A. IL Sidccki, G. D. Brayer. L. T. 
Dclbaerc, C A. Bauer./. Afa/. Bitd. 144, 43 (1980). 

20. P. H. Morgan rr «/., Proc. Next. Acad. Sd. U SA. 69, 
3312 (1972). 

21. A. A. Kossiakoff'pr «/., Biocbtrntstry 16. 654 (1977); 
H. Fehlharamer. W. Bode, R. Huber./. MeL Biol. 
Ill, 415 (1977). 

22. ). L. Qiambcrs et al.^ Bioehtm. Biophys. Ra. Com- 
nmn. 59, 70 (1974). 

23. M. O. Jones and R. M. Stroud, Bi otbtmio ry^ in press. 

24. D. M. Bkw ft tU,, Nam (Lauton) 231, 337 
(1969). 

25. C S. Oaik et /d.J. Biol. Cbem, 259, 14255 (1984). 

26. The coordinates were obtained from the Protein 
Data Bank at Brookhavcn National Laboratory. 

27. We thank ). Sadowsky, C. Nctlscn, and E. Gold- 
smith for assistance with Area Detector data collec- 
tion and processing and B. Montfort for assistance 
with crystallographic refinement calculatians. We 
grateful^ acknowledge grant support from NIH: 
AM31507 CO S.R.5.. GM24485 to R.M.S., and 
AM26081 to R.J.F.; from NSF: DMB8608086 to 
CS.C. and PCM830610to W.J.R.; a Bristol Meyer 
grant of Research Corporation and a CCRC grant 
to C.5.C. The coocdinatcs of the D 102 N trypsin 
structure at 6 have been submitted to the Protein 
E>ata Bank at Brookhavcn National Laboratory. 

29 September 1986; accepted 29 May 1987 



Studies indicate that Scr'" and His^' play 
crucial roles in catalysis. For example, selec- 
tive reaction of Scr^ with diisoptopylfluor- 



C S. Craik, Departments of Pharmaceutical Ghcmistiy 
and of Biochemistry ai»d Biophysics, University of Cali- 
fornia. San Francisco, San Francisco, CA 94143-0446. 
S. Roczniak, C Largman, W. J. Rutter, Hormone 
Research Institute ancTDepartnient of Biochemistry and 
Biophysics, University of California, San Francisco. San 
Francisco. CA 94143-0448. 



•Present address: NutraSwcet Company, Mount Pros- 
pect. IL 60056. 

tPresent address: Veterans Administration Hospital, 
Martinez, CA 94553, and I>epartinents of Internal 
Medicine and Biologic^ Qienustry. University of Cali- 
fornia. Davis, CA 95616. 



The Catalytic Role of the Active Site Aspartic Acid in 
Serine Proteases 

Charles S. Craik, Steven Roczniak,* Corey LARGMAN,t 
William J. RunrrER 



The role of the aspartic acid residue in the serine protease catalytic triad Asp, His, and 
Scr has been tested by replacing Asp''" of trypsin with Asn by sitc-dircctcd mutagene- 
sis. The naturally occurring and mutant enzymes were pniduccd in a heterologous 
expression system, puriiicd to homogeneity, and characterized. At neutral pYi the 
mutant enzyme activity with an ester substrate and with the Ser*''-specific reagent 
diisopropytfluorophosphate is approximately \(f times less than that of the unmodi- 
fied enzyme. In contrast to the dramatic loss in reactivity of Ser'", the mutant trypsin 
reacts with the His^-spccific reagent, tosyl-L-lysinc chloromcthylkctonc, only five 
times less efficiently than the unmodified enzyme. Thus, the ability of His*' to react 
with this affinity label is not severely compromised. The catalytic activity of the mutant 
enzyme increases with increasing p¥L so that at pH 10.2 the is 6 percent that of 
trypsin. Kinetic analysis of this novel activity suggests this is due in part to participa- 
tion of either a titratablc base or of hydroxide ion in the catalytic mechanism. By 
demonstrating the importance of the aspartate residue in cai^ysis, especially at 
physiological ^H, these experiments provide a rationalization for the evolutionary 
conservation of the catalytic triad. 



21 AUGUST 1987 



REPORTS 909 



ophosphatc (DFP) (2) or modification of 
the His^' of trypsin with tosyl-L-Iysinc 
chloromcthyi ketone (TLCK) (5) blocks 
catalytic activity. The collective data suggest 
that substrate hydrolysis is facilitated 
through nuclcophilic attack by the Scr"^ 
hydroxyl oxygen on the carbonyl carbon of 
the substrate. Concomitantly the hydroxyl 
proton of the serine can be transferred to the 
imidazole of His'' and subsequently donat- 
ed to the resulting leaving group (alcohol or 
amine) in the reaction. The remaining acyl 
enzyme intermediate is hydrolyzcd by a 
mechanism that is the reverse of its forma- 
tion except that water instead of Ser*^ 
serves as the nudcophile. The role of the 
buried carboxylatc of Asp'**^ in the catalytic 
process remains to be clarified experimental- 

The geometric relation of the amino acids 



Table 1 . Ratios of activity for trypsin and D 102 N 
trypsin. Assays for Z-Lys-S-Bzl were performed at 
pH 7.15 and 10.18 (sec legend to Fig. 1 for a 
description of the experimental conditions). Val- 
ues for k^[l] with DFP were determined by the 
method of Kitz and Wilson (24). Standard condi- 
tions (25) were used except when the initial DFP 
concentration was 10 mM in assays with D 102 N 
trypsin at pH 10.03; background hydrolysis of 
DFP was relatively rapid and enzymatic acuvicy at 
infinite times did not equal zero. In this case the 
*ob»'[I] value (where [I] is the concentration of 
inhibitor) was determined by the method of 
Yosgimura a al. {26). Values of KibJ{l\ firom 
assays with trypsin were calculated to be 
790 d:8QM-' min"* (pH 7,96) and 980 ± 
70 Af" ' min" ' (pH 10.03). In assays with D 102 N 
trypsin these values were 0.070 ± O.OOSM ' 
min"' {pn 7.96) and 0.098 ± 0.019Af~* niin~' 
ipW 10.03). Titrations with MUGB were fol- 
lowed at 360 nm on a Perkin-Elmcr LS5 spcctro- 
fluoromcter and performed in triplicate in 50 rruVf 
Hcpcs buffer. pH 7.5, that contained 2 yM 
MUGB. Titrations of trypsin were complete in 2 
seconds (the minimum detection time of the 
fluoromctcr) or less when enzyme concentrations 
ranged from 50 nM to 400 nAf. Approximately 
17 minutes elapsed before a molar equivalence of 
MUGB reacted with 400 nM D 102 N trypsin. 
Values for k^yj{l] with TLCK were determined 
by the method of Kitz and Wilson (-24); standard 
conditions were used (27). KhJi^ values from 
assays with trypsin were calculated to be 
760M~' min~* (p^i 7.16) and 387JVf ' min"' 
{pH 8,77). In assays with D 102 N trypsin these 
values were 149Af"* min"* (pH 7.16) and 
281iW' min~* {pH S,77). The instability of 
TLCK and MUGB at alkalainc/>H values preclud- 
ed these assays at higher pH values. 





Ki- 
netic 
con- 
stant 


Relative activity 


Ligand 


Neutral 
pU 


Alkaline 
pH 


2-Lys-S-Bzl 

Z-Lys-S-Bzl 

DFP 

MUGB 

TLCK 


Vfitr 


4,400 
11,300 
11,300 

>500 
5.1 


18 
152 
10,000 

1.4 



910 



in the catalytic triad led to the postubtc that 
Asp***^ serves in concert with the hisddinc 
imidazole group to transfer the proton from 
the serine in a charge-relay mechanism (4). 
However, nuclear magnetic resonance 
(NMR) studies (5) showed diat the Asp'°^ 
and the His^' moieties displayed normal pK^ 
values {K^ is the ionization constant); this is 
incompatible with the implications of the 
charge-relay mechanism {6). Furthermore, 
neutron diflEiraction and 'H NMR studies of 
the imidazole nitrogens in the resting state 
of the enzyme show that no proton transfer 
occurs from His^^ to Asp"'^ (7). Asp*®^ may 
be involved in the stabili^tion of the imida- 
zolinium intermediate and the orientation of 
the correct tautomer of His^ relative to 
Scr'"' and the substrate (8). However, a test 
of the function of Asp'^ by selective chemi- 
cal modification, has not been possible be- 
cause it is inaccessible to chemical reagents 
under nondcnaturing conditions. Wc have 
evaluated the catalytic role of Asp'**^ by 
replacing this residue with Asn. This elimi- 
nates the negative charge with little change 
in the van dcr Waals surface of the side- 
chain atoms (NH^ versus OH). 

Conversion of the Asp*" codon (GAG) 
to an Asn (AAC) codon within the rat 
aruonic trypsinogen DNA (P) was accom- 
plished by site-directed mutagenesis {10). 

Fig. 1. Profile of activities for trypsin and D 102 
N trypsin-catalyzcd hydrolysis of 2>Lys-S-Bzl. 
(A) Plot of log(Ac«i/iCm) versus^ H and (B) plot of 
log *«i versus /?H, for trypsin (•), and D 102 N 
trypsin (O). Assays were performed at 25'X;; in 50 
mM Mcs [2-(N-morpholino)ethancsuifonic add]. 
Mops, or Taps buffers, pH 4.43 to 8.77, or 50 
mM glycine, pH 9.25 to 10.18, that contained 
O.lAf NaCl and 1 mM CaCI^. Stock solutions of 
ZrLys-S-Bzl and 4,4'-dithiodipyridine were pre- 
pared in water and dimethylformamidc, respec- 
tively. The pH of all reacnons was determined 
immediately after reaction. To a cuvette that 
contained 0.97 ml of the assay solution was added 
10 ^ of a 25 mM solution of 4,4' -dithiodi pyri- 
dine (final concentrations: 250 jjlM 4,4' -dithiodi - 
pyridine and 1% dimethylformamidc) and 10 ^ 
of a Z-Lys-S-Bzl stock solution. The concentra- 
tion of substrate ranged fix>m ten times greater 
than to ten times less than the Kf„ of the enzyme. 
AJfter the background rate of hydrolysis was mea- 
sured spcctrophotomctrically (Bcckinan DU-7) at 
324 nm, 10 of an enzyme stock solution (in the 
case of trypsin, diluted in O.S mg per milliliter of 
bovine scrum albumin) was added and the initial 
rate of hydrolysis was measured. At values 
greater than 9.25, for which the background 
hydrolysis was substantial (up to 2% 2rLys-S-Bzl 
hydrolyzcd per rninute), a reference cell that „ <• u 

contained substrate and 4,4'-dithiodipyridinc was used during kincnc measurements. In aU of the assays 
the initial rates were measured from data for the initial 5 to 10% of die hydrolysis of substrate. Z-Arg-S- 
Bzl was not used as substrate because this compound shows a background hydrolysis rate 20 tunes 
Ercater than that for Z-Lys-S-BzJ at alkaline pH {14). Substrate and enzyme concentration d«ennina- 
rions were performed with standard procedures {29, 30). Values for *«! ^/^m parameters from aU 
assays were derived by a program diat performed a weighted linear and nonlinear squares .re^«w;on 
analysis of data by using the Lincwcavcr-Burk and MichacUs-Menton equanons, respectively (3i), 
Double reciprocal plots of the data were linear in aU cases. Values of pK» Mid were determined by 
the pTX>gram MULTI (32) which performs a nonlinear squares analysis of the data. 

SCIENCE, VOL. Z37 



The DNA that encodes the mutant enzyme 
was sequenced in its entirety to ensure that 
no inadvertent base changes were intro- 
duced during the mutagenesis procedure. 
The mutant enzyme trypsin"*^ (Asp—*- Asn), 
referred to as D 102 N trypein and the 
naturally occurring trypsin were expressed 
under the control of the simian virus 40 
(SV40) early promoter (ii) in stably trans- 
formed cukaryotic cell lines that secreted the 
zymogen form of the enzymes into the 
culture medium {12). D 102 N trypsin and 
trypsin were purified to homogeneity and 
crystallinity by a combination of ion-ex- 
change and affinity chromatography tech- 
niques. Trypsin isolated fixjm this expres- 
sion system displayed physical and catalytic 
properties identical to trypsin purified from 
the rat pancreas. In contrast, D 102 N 
trypsin exhibited dramatically different cata- 
lytic aaivity. 

The aaivitics of trypsin and D 102 N 
trypsin toward various substrates and inhibi- 
tors arc compared in Table 1 . At neutral />H 
the catalytic efficiency of D 102 N trypsin as 
measured by its ability to hydrolyzc the 
ester substrate N-bcnzyloxycarbonyl-L-ly- 
sine thiobcrizyl ester (Z-Lys-S-Bzl) is severe- 
ly compromised (Jfecat <x Jfecat/^m values are 
—10^ rimes lower than that of trypsin; *cai is 
the catalytic rate constant and is the 



10 n 



I "-I 



5 



O) 

o 



7- 
6 - 
S - 

4 - 

3 - 

5 -1 



'5 

E 

X 3 
+ 

.8 2- 



s 1 -A 




Michaelis constant). However, the relative 
activity of the mutant enzyme progressively 
increases with increasing pH values. To de- 
termine the relative reactivity of Ser'*^ and 
His'' both enzymes were created with the 
specific active site— directed reagents DFP 
and TLCK. The inhibition of D 102 N 
oypsin by DFP, which is specific for Scr*^, 
is approximately four orders of magnitude 
slower than thai of trypsin at both pH 8.0 
and pH 10 0. The active site titrant 4- 
mcthylumbcUiferyl • p - guanidinobcnzoate 
(MUGB) {13) also reacts with D 102 N 
trypsin at a rate at least 50t>-f6!d slower than 
with trypsin at pH 7.5. These data sug- 
gest that the nuclcophilicity of Scr'" is 
dependent on the negative charge of 
Asp'^^ 

The substrate analog TLCK reacts spccifi- 
caDy with His'', presumably because the 
binding pocket of the substrate positions the 
reactive chloromcihyl-kctonc group adja- 
cent to His''. In contrast to the large de- 
creases in activity monitored with DFP and 
MUGB, TLCK is five times less reactive 
with D 102 N trypsin than with trypsin at 
neutral pH (pH 7.2) and one and a half 
times less reaaive at more alkaline pH (pH 
8.8). Thus the active site reacts virtually 
normally with the affinity reagent. The dif- 
ferential effect of the Asp to Asn subsdui- 
tion on the inhibition of D 102 N trypsin by 
DFP and TLCK may be due to differences 
in the proximity of the reactive groups of the 
inhibitors and the enzyme. However, a 
more likely explanation is that the imidazole 
of His'' in D 102 N trypsin is not in the 
correct tautomeric state for removal of the 
Set*" proton and thereby reduces the reac- 
tivity of the enzyme to DFP. However, 
His" can still react with the chloromcthyl 
ketone moeity of TLCK and thereby inhibit 
the enzyme. 

The modified and unmodified enzymes 
exhibit different pH activity profiles for the 
ester substrate (Tabic 1 and Fig. 1). Similar 
data have been obtained with peptide sub- 
strates (14). In agreement with studies on 
bovine cationic trypsin (i5), rat anionic 
. trypsin shows a sigmoidat dependence of 
activity (pK^t = 6.8) with nwximal *ciii 
KJKrn values of 7498 ± 254 min"* and 
1.20 ± 0.28 X lO^Af"' min~', respectively 
(id, 17). The rat enzyme resembles porcine 
clastase {18) but differs from bovine trypsin 
in being alkaline stable. The dominant effect 
of the Asp ro Asn mutation is on kcm. The 
Kfn values of the two enzymes are similar at 
any given pH value. The D 102 N trypsin 
activity is dramatically lower (—10* times as 
measured by Jfecot or Acat/^m) than trypsin 
activity at neutral pH values; however, it 
increases progressively at alkaline pH values 
from the low value at neutral pYi to values 

21 AUGUST 1987 



s 

E 

I 




BO- 



60- 



40 - 



20- 



0-^ r 



FI9. 2. The pH dependence of 
die kinetic parameter k^^^Km of 
D 102 N trypsin-catalyzcd hy- 
drolysis of Z-Lys-S-Bzl. The 
points correspond to the cxpcri- 
mcntaUy derived k^K^ values. 
Curve A' is derived from substi- 
tuting the calculated rare and 
equilibrium oxutants Aohi 

and K2 into Eq. L Values 
for AoH vkI A2 were determined 
6rom assays performed from pW 
8.36 to 10.18 where it is as- 
sumed that Ki » [H*] and 
*oh(OH"] >> Equation 
I can then be simplified and 
rearranged to describe a 
r** straight line: (*„/^™)[H*] 

« -JC2(*«i/J^in) + (10 **)*oH- Linear regression of this line yidds Aqh ^2 values of 

I 45 £ 0.12 X 10"Af-' min"' and 1.21 ± 0.30 x IQ'*^M, respectively. Values ofKj and were 
drtemiined from assays poformed from pH 4.43 to 7.33 where [H^) » Kj, By using the Aoh value 
determined above, Eq. 1 can again be simplified to a linear form: [k^tfK„)[ii*] - 1-45 x 10 ]/ 
rH-^l ^ l/Kt[lAS X 10'^ - {k^^K^)[H*]] + Linear regression analysis of this line yields 
and Jc, values of 4.78 ± 0.22 x 10* min"* and 3.67 ± 0.32 x 10-*Af, respectively. Inset: Plot of 
WJ^m versus pH from />H 4.43 to 7.33. Curve A is the same as described above. Curve B dcscnbcs the 
contribution to the catalytic rate of D 102 N trypsin that depends on lOH"]: *oh[OH ]/(1 + Kjf 
\H^]) Curve C describes the contribution to the catalytic rate of D 102 N trypsin independent of 
OH-] detected at lowcr^H values: *,/[! + ([H")/JC,) + {KA^"])Y Note thatcurvc A is die *um of 
curves B and C. The doned line perpendicular to the abscissa is the pK^ of the mutant enzyme calculated 
from the inflection point of the activity profile. 

Table 2. Values for Aoh, and pK^ derived from the D 102 N trypsin-<^yKd hydrolysia ofZ- 
Lys-S-Bzl. The Aqm, Km, and pK^ parameters derived from k^JK^ values were determined as dracnbcd 
in the legend to Fig. 2. The pKz values for and were not determined due to cxpcnmcntal 
constraintt described below. The Ac.i parameter docs not appear to depend on the ionization of a 
residue in the pH range between 4 and 8. Equation 1 can then be reduced to: 

4c« -= [Wl + iK2f[li*m + [*OH[OH-)/i + (JCy[H*])] 

Values for *oh and JCj can be determined from assays performed atpH vahies of 8 and greater where it 
is assumed dui AoHfOH"] » Ac« The equation can tiicn be rearranged to die Imear form *c*tlH J 
= -Kjk,^ + (10-")AoH. Linear regression analysis of thU line with data from ^V^ff^^^^^^ 
pH 7.96to 10.18 yields a *oh value of 5.50 - 0.21 x lO-Vtf- min- and a JC, yalueof 5^9 ± 0.50 x 
10" ' ^M. The vahic of *cnz can be estimated from assays performed at/»H values less dwn 8 where J 
» JCi By using the koH value determined above the equation can be reduced to ien» 
A. - 5 50 X lO^rOH- 1. Subtracting die calculated 5.50 x I0*(OH-] values firom die experimentally 
toived X., valued froni pH 4 A3 to pH 7.33 gives a t^. value of 0.37 0.09 min" ' "nie 
dependence of the acylation rate constant fc, of the D 102 N trypsm-^atalyzed hydrolysis of Z^Lys-S-Bzl 
widctcnnincd by performing assays at 25'C in 50 mAtf Mcs, Mops, or Taps bufes, jjH 4.81 to 8.36 
under identical conditions as for assays described in die legend to Fig. 1 except diat D 1 02 N to^psm 
concentrations (4 to 40 \tM) were in large excess over xhc initial substrate conccntrauon (0.54 i*^) and 
the reaction was allowed to proceed to completion. Assays performed at^H values above ;»H 8.4 were 
too fost to foUow spcctrophotometrically dicreby preventing die detcrmmation of (acyUtion) values. 
Values for and IC^ were determined by xhc procedure of Kccdy and Bender (28) , The *oh 




(*,IH*] - 4.91 X 10-»)/[H*) = (1//C,)(4.91 x lO"' - *2lH*]) + *, 

Linear regression analysis of diis line widi kj values determined from assays pciforrned from 4.81 to 
pH 6.70 yielded a Ae«/value of 1 .32 S: 0.08 min" ' and a if, value of 5.35 ± 1.00 X 10 Values for 
A, (deacylation) were calculated using xhc cxpcrimentaUy derived fcc« and *j values and die equation: 
A, = (k .ktVik-, - A...). The *oh vJue was determined from a plot of the *, values versus solvent 
hyJn^^d^ ion con^^^^ pH 6.70 to 8.36; *oh = 4^7 ± 2.43 x lO'^fj' min". The 

maximal value of the deacylation rate constant of the hydroxide- mdcpcndent pathway, *cr«, w 
calculated by incorporating die A.„, values for Aj and A^., determined above into the equation A, - A 
kMi - Ac,,). This gives a (deacylation) of 0.51 ± 0.07 min''. The value of A, bkc A^., shows 1 



was 



cat 

no 



Kate 
constant 


*OH . 

(Af-' min"') 


(min-') 


pK, 


pK2 


kcm^l^m 

*2 

*> 


5.50 X 10* 
1,45 X 10"Af-' 
4.91 X 10* 
4.17 X 10' 


0.37 

4.78 X 10*Af-' 

1.32 

0.51 


5.4 
5.3 


10.2 
9-9 



RfiPORTS 911 



that approach those of the native enzyme 
6%: K.JKrn 1%) at pH 10.2. 
The ascendant alkaline limb of the acrivi- 
xy-pH profiles of the D 102 N trypsin is not 
an artifact due to deamidation of the Asn 
residue to Asp, since mutant enzyme activity 
at neutral pHs is not affeaed by preincuba- 
tion at alkaline pH. Furthermore, one would 
expect the pH activity proffles to be similar 
in shape to those of the naturally occurring 
enzyme if they merely reflected contamina- 
tion by trypsin. We ascribe this ascendant 
basic limb to the participation of a titratablc 
base or bases or of OH " itself. Although the 
mechanism of catalysis by the D 102 N 
trypsin is unknown, the pH rate profile of 
KJKtt, can be described by a bipartite rate 
equation in which one part represents the 
catalytic rate dctcaed at the lower values 
and the other part describes the catalytic rate 
that shows a dependence on hydroxide ion 
concentration {19). The observed rate con- 
stant kcut^Ktn can be defined as: 



tnz 



1 + ([H^]//C,) + (/(:2/[H*]) 



Aoh[OH-] 
1 + {KAH^) 



where Aenz is the rate constant of the hydrox- 
ide independent pathway, Ki and Kz are the 
dissociation constants of the ionizing 
groups, and *oh is the rate constant of the 
hydroxide ion dependent pathway. The cat- 
alytic activity of the OH "-activated and 
OH "-independent pathways can be re- 
solved with Eq. 1. Values for k^JKm deter- 
mined from mutant enzyme activity studies 
above pH 8.0 show an increase with solvent 
hydroxide ion concentration that yields Aqh 
and Kt values of 1.45 ± 0.12 X lO^Af"^ 
min'* and 1.21 ± 0.30 x 10"»**M {pKz 
= 9.9), respectively. Between pH 8.0 and 
pH 8.8 the k^JKrr, values increase linearly 
with hydroxide ion concentration. The 
slight decrease from linearity above pH 8-8 
may reflect the ionization of another group 
with an alkaline pK^ value such as the lysine 
substrate or the amino-terminal group of the 
protein {20). 

There is good agreement between the 
calculated ifecat/^^'m curve derived from Eq. 1 
and the experimentally derived values (Table 
2 and Fig. 2). Measurements of k^iJKm 
values below pH 8.0 yield Aem and X, values 
of 4.78 ± 0.22 X lO^Af"' min~' and 3.67 
2: 0.32 X 10"*Af {pKx = S.4), respectively. 
A comparison of the *enz value for D 102 N 
trypsin and the maximal k^ai^Ktn value for 
trypsin indicates that the activity of the 
mutant enzyme (ignoring the contribution 
of the OH" -dependent pathway) is 25,000 
times less than that of trypsin. Thus Asp 
is crucial for the catalytic aaivity at neutral 



912 



pH values. However, the rate of hydrolysis 
by the mutant enzyme is still 400 times 
greater than the rate of solvent hydrolysis of 
the substrate. The inflection points of the 
curves in Fig. 2 suggests that the pK^ of 
His^' has decreased 1.5 pH units in D 102 
N trypsin compared to trypsin. The putative 
alteration in the pK^ value of His^^ reflects 
the replacement of the negatively charged 
carboxylatc group with a neutral amide 
group. The mutant enzyme exhibits classic 
burst kinetics on ester substrates below pH 
7.0. This implies that an acyl enzyme inter- 
mediate accumulates and that dcacylation is 
rate determining in this pH range (14). 

It has been suggested that Asp'^^ controls 
the position of the neighboring His^ resi- 
due that in turn modulates the polarity of 
the Scr'^^ (8). Our demonstration of the 
crucial role of Asp'*" is not surprising in 
view of the strict evolutionary conservation 
of this residue within the catalytic triad. The 
magnitude of the catalytic defect from the 
A5p'°^ — > Asn replacement and the alkaline 
activation of the enzyme arc une;cpccted. 
The three-dimensional structure of D 102 N 
trypsin is virtually identical to that of trypsin 
in the alkaline pH range {21). Thus the 
activity of the mutant enzyme arises from an 
active site conformation that resembles the 
native structure. Certain properties of the D 
102 N trypsin superficially resemble chymo- 
trypsin methylated at His" (22). The activi- 
ty of both enzymes is dramatically lower at 
neutral p¥l values and increases in propor- 
tion to OH" concentration. However, the 
rate constant ascribed to the reaction with 
OH" ions is 1000 times greater for the D 
102 N trypsin mutant than for chymotryp- 
sin with the modified histidinc. Neverthe- 
less, these results are consistent with the 
view that compromising the function of the 
histidine dramatically decreases catalytic ac- 
tivity at neutral pH values. This dcfca can 
be partiy overcome at basic pH. The alkaline 
pH may affect the catalytic reaction indirect- 
ly by affecting the ionization of groups tha^ 
function in catalysis. Alternatively, OH" 
might participate dircaly in the reaction; 
this would require activation at very low 
hydroxide ion concentrations. The overall 
catalytic mechanism of the D 102 N trypsin 
activity is unknown at present. The activity 
may be due in part to a nucleophilic contri- 
bution firom the imidazole nitrogen of His 
instead of Ser*'* as has been detected in the 
cleavage of active esters of nonspecific sub- 
strates (23). Alternatively, a residue distant 
from the active site may contribute to stabi- 
lization of the tctrahedral intermediate at 
basic pH. Whatever the mechanism of ac- 
tion, D 102 N trypsin displays distinctive 
properties that distinguish it from trypsin. 
Its low activity in the neutral pH range 



makes it an unattractive catalyst for most 
biological fimctions; thus it might not be 
expected to persist in evolution. The Asn 
mutant, however, is of considerable interest 
as a distinctive serine protease. This work 
illustrates the potential for creating new 
variants that are rK>t found in nature because 
they arc active under extreme conditions 
that are usually incompatible with cclhilar 
environments. 

REFERENCES AND NOTES 

1. J. J. Birktoft and D. M. Blow,/. Mot. Bid. 68, 187 

(1972) ; A Tulinsky a al., Bwdtamstry 12, 4185 

(1973) ; R. M. Stnnid tt oLJ. Mol. Biol. 83, 185 

(1974) ; R. Hubcr ct iWrf. 89, 73 (1974); L. 
Sawyer a td., ibid. 118, 137 (1978); W. Bode 

ibid. 164, 237 (1983); C. S. W^righ! <r «/.. Nam 
(London) 221, 235 (1969); G. D. Braycr et al.^J. 
Mol. Biol. 124, 261 (1978); P. W. Codding a 
Can. J. Biochim. 52. 208 (1974); G. D. Braycr « a/., 
/. Mol. Biol. 131, 743 (1979). 

2. G. H. Dixon, S. Go. H. Ncurath, Biocbim. Bufbyj. 
Aaa 19, 193 (1956). 

3. E- Shaw, M. Marc«-Guia, W. Cohen, Biocbtmiory 4, 
2219 (1965). 

4. D. M. Bkw. J. f. Birktoft, B. S. Hartley, Natvr* 
(London) 221, 337 (1969). 

5. W. W. Bachotfchin and J. D. Robert*,/. Am, Cham. 
Soc 100, 8041 (1978). 

6. G. A. Rogcn and T. C Bniioc, ibid. 96, 2473 (1974). 

7. A. A. KossiakoffandS. A-Spenccr,Na»«(L««*wi) 
288, 414 (1980); ). L. Maridcy and I. B. Ibano, 
Biechemittty 22, 4627 (1978). 

8. L. Polgar and M. L. Bender. Pwc. Natt. Acad. Sd. 
USJi. 64, 1335 (1969); A. R. Fcrsht and J. 
Sperling,/. Mol. Biol. 74, 137 (1973). 

9. C. S. Oaik a al„ Science 228, 291 (1985). 

10. M. ). ZoUcr and M. Smith, DNA 3, 479 (1984). 

11. P. J. Southern and P. Berg,/. Mol. App. Gtna, 4, 
327 (1982). 

12. This was accomplished as follows: Chinese hamster 
ovary cells were co-rramfected with a plasmid th« 
conoincd cidicr the trypsinogen or D 102 N tryp- 
sinogcn DNA constructs under transcriptional con- 
trol of die T-amigcn early piomotcr of SV40 and a 
plaOTiid ihat encoded the tnctcrial phosphotransfcr- 
asc gene {neo). The fuo gene conferred resistance to 
die amino-gtyawidc antibiodc G418 and permitted 
the phenotypic selection of a cell line with high 
probabibty of co-cxptcssing the trypsinogen gene. A 
filter screening assay was dcvclojid for dctecong 
high Icvcb of protein accretion from transfectcd cclb 
in order to isoUtc ccU Unci diat overproduced 
tryp«inogen (C. S. Craik and R. L. Burke, unpub- 
lished results). Cell lines that produced trypsinogeiu 
in large amounu (about 10 mg/litcr) were dicn 
expanded into mass cuhure (40 liters). 

13. J. C. McRae ft al., Biocbcntiary 20, 7196 (1981). 

14. C. S. Craik « a/., unpublished results. 

15. T. Inagami and J. M, Sturtcvant, Biochim, Biopbys. 
Acta 38, 64 (1960); H. P. Kaiscrra and K. J. 
Laidler, Can.}. Cbtm. 47, 4021 (1969). 

16. The catalysis constant fe„, is used as a measure of 
catalytic activity and is a fim-ordcr rate constant that 
refers to the properties and reactions of the enzyme- 
substrate, enzyme- intermediate, and enzyme- prod- 
tKi complexes. The Michaclis constant Km rdatcs to 
the binding affinity of the enzyme for its substrate 
and is an apparent dissociation constant that may be 
treated as the overall dissociation constant of all 
enzyme-bound species. The ratio of k^K^ is an 
apparent second-order rate constant that refers to 
the properties and reactions of the free enzyme and 
free substrate (i7). 

17. A. Fershi, Enxymt Strututrt and Mecbanum (Frce- 
* man. New York, 1985). 

18 P. Gcneste and M. L. Bender, Proe. Nad. Acad. 5a. 
' U.S^. 64, 683 (1969). 

19. The equation fits the experimental points and sug- 
gests a mechanism that involves a combination of an 
acid HA and OH'. However, by the principle of 
kinetic equivalence it cannot be distinguished from a 

SCIENCE* VOL, 237 



tncchanisin invotving A". 

20. M. L. Bender a «/.. ). Am, Chcm. Soc. 86, 3680 
(1964); A. Hunoc and G. P. Hc»s, Biochrm. Biophyi. 
Ra. Commun. 23, 234 (1966). 

21. S. Sprang (t Satnce 237. 905 (1987). 

22. R. Henderson. Biocbem. ]. 124. 13 (1971); J. 
Fastrcz and N. Houyct. Eur. J. Biothtm. 81, 515 
(1977). 

23. C D. Hubbard and J. F. Kirsch, Biochemistry 11, 
2483 (1972). 

24. R- Kitz and F. S- Wilson,/. Biot, Cbtm. 237, 3245 
(1962). 

25. Incubatioru whh diisopropylfluorophosphatc 
(DFP) wcr« performed at 25*C with 100 nAf wild- 
type or mutant enzyme and varying cxmccnnaticHU 
of DFP in cither 50 mM Tap <3-{(tri$(hydioicy- 
meihyl)methyl]ainino} propancsuiromc acid), 
7.96, or 50 mM glycine, f H 10.03, that contained 
O.lAf NaCl, 1 mM CaQj, 0.005% (wA') Triton X- 
100 and 5% (v/v) isopropanol. DFP stock sohiiions 
were made up in iaopropanoL Final volunKs were 
0.200 ml and 6.2 ml when incubations were per- 
formed with trypiin and D »02 N trypsin, respec- 
tively. Trypsin enzyme activiucs were measured 
spectrophotometrically at 324 nm by adding 10 
of the trypsin- DFP sohition to 0.99 ml of the sanK: 
buffer (1 nAf trypsin final concentration) that con- 
tained 60 \lM N-benzyloxycarbonyl-L-lysinc ben- 
rylthioestcr (Z-Lys-S-Bzl) and 250 ^^^ 4,4'-dilhio- 
dipyridinc but no DFP. Mutant enzyme activities at 
324 nm were determined by adding 10 |U of 6 mM 
ZrLw-S-Bzl and 10 H.I of 25 mM 4,4'-dithiodipyri- 
dinc to 0.98 ml of the D 102 N trypsin-DFP 
sotudon. The concentrations of DFP during inci^- 
dons with trypsin at both values were O, 20, 25. 
40, 80, or 200 jiAl. In incubations with D 102 N 
trypsin initial DFP concentrations were 0, 5, 8, 10, 
or 12.5 nWVf. and 0, 10, 12.5, or 16.6 mM when 
assays were performed at f H 7.96 and pH 10.03, 
respectively. 

26. T. Yosgimura,. L. N. Barker, J. C Powers, /. Biei. 
Chem. 257, 5077 (1982). 

27. Incubations with tosyl-L-lysine chJoromcthyl kctottc 
(TLCK) were performed at 25*C with 5 »iAf un- 
modi6cd or mutant enzyme in 100 m-I of cither 100 
mM Mops (3'N(morpholino)pTopancsulfonic acid], 
pH 7. 16 or 100 mM Taps, pH 8.77, that contained 
cither 0 or 200 \iM TLCK. Immediately before 
assaying trypsin activity an aliquot from the incuba- 
tion mixtures was diluted 20-fold in 50 mM incuba- 
tion buffer that contained 0. LM NaO, I mM CaQi, 
and 0.005% (w^f) Triton X- 100. Ten microliters of 
the diluted enzyme solution was added to 0,99 ml of 
the dilution buffer that contained 250 \iM 4,4'- 
dithiodipyrtdine and 100 \tM Z-Lys-S-Bzl (assay 
buffer); enzyme activities were followed at 2S"C 
spectrophotometricaUy at 324 nm (2.5 nM trypsin). 
Mutam enzyme activities were dietcnnincd m an 
identical manner as described above except that 10 

of the incubation mixture that contained D 102 
N trypsin was added dircctiy to 0.99 ml of the assay 
buffer (50 nM D 102 K trypsin). After 3 hours oS 
incubation with TLCK. the loss of catalytic activity 
of both enzymes at both pH values was complete. 
Since there was no acriviry of cither enzyme after 2.5 
hours of further incubation with a large excess of 
substrate over TLCK, the trthibition was probably 
due to formation of a covalcnt bond between the 
enzyme and the inhibitor and not to a stow-off rate 
for a rwncovalently bound competitive inhibitor. 

28. F. J. Kezdy and M. L. Bender, Bioehenustry 1, 1097 
(1962). 

29. Substrate concentrations were calculated by using 
the total change in absorbance at 324 nm when 
Tcsuiions were run at pH 7.5 [6314 " 19,80QM~' 
cm~ ' (V J)] and catalyzed by 1 nM trypsin so that the 
reaction vrouki be completed within several minutes. 
The molar absorptiviues over the ran^ of 4. 1 tt> 
10.6 were determined by repeating these reactions 
with known substrate concentrations at various pH 
values. It was found that at pW values below 7,6 the 
molar absorptivity remained at 19,80QM~' cm"', 
but that this value decreased sigmoidally above pH 
7.6, with an apparent pK^ of 8.7, presumably reflect- 
ing the ionization of thiopyrkiine. These molar 
absorptivity values were used to convert rates of 
reaction into molar rates at the appropriate pH 
values. 



The coaccniration of viable active sites in lutivc 
enzyme preparations wczv determined by active site 
titrations with 4-mcthytumbcUifcryl p-guanidino- 
bcnzoate (MUGB) by using 4-mcthytumbcltifcronc 
as a standard {30) as described in the legend to Tabic 
1. Titianoru of D 102 N trypsin proved to be too 
slow to mcasorc active sites accurately. Mutant 
enzyme conccntrattom were thus dctcrrnincd in 
dupticatc by absorbance at 280 run 
(ezao ** 38,000M~' cm"*). The accuracv of riib 
molar absorptivity vahic was confirmed by amino 
acid analysis with Rorkudne as an internal standard. 

A dan^ in following the activity of D 102 N 
trypsin is that an urtknown proportion of the activity 
may be due to trypsin that has formed chrou^ 
deajTiidation. This docs not appear to be a problun 
at pH values less than S where the activity of the 
mutant enzyme is less than 0.1% that of trypsin. At 
alkaline pH values, where the activity of the mutant 
becomes significant, the jKssibdlity of activi- 
ty rcstiking from deamidation becomes greater. 
However, assays with 100 nM D 102 N trypsin and 



60 \iM Z-Lys-S-Bzl as substrate at pH 7.16 and pH 
10.24 after prior incubation of the cnzyitK in bunrs 
at cither pH value for 1 hour gave initial rates of 
reaction of 1.00 ± 0.04 min"' and 249 ± 5 min"*, 
respectively. These rcsulu indicate that significam 
deamidation of the O 102 N residue to an aspaxtic 
add did not occur in the pW and time raises 
studied. 

30. G. W. Jameson. D. V. Roberts, R- W. Adams. S. A. 
Kyk, D. T. Elmore, Biocbnn. J, IM. 107 (1973). 

31. D. V. Roberts, in En^ftm Kwtia (Cambridge 
Univ. Press. Cambridge, 1977), p. 299. 

32. K. Yamaoka ex el. J. Pbarm. Dpt. 4, 879 (1981). 

33. We thank J. F. Kiisch and £. T. Kaiser for helpful 
discussioru and L. Spcctor for preparing the maruj< 
script. Support by NSF grant PCM830610 
(W.J.R.) and DMB860S086 (C.S.C), and a Bristol 
Meyer grant of Rxscarch Corporation (C5.C) is 
gratefully acknowledged. An NIH postdoctoral fcl- 
towship was awarded to S.R. (GM 10765). 

29 September 1986; accepted 29 May 1987 



Adrenal Medulla Grafts Enhance Recovery of Striatal 
Dopaminergic Fibers 



Martha C. Bohn,* Lisa Cupit, Frederick Marciano, 
Don M. Gash 



The drug, I -mcth^-4-phcnyl- 1,2,5,6- tctrahydropyridine (MPTP), depletes striatal 
dopamine levels in primates and certain rodents, including mice, and produces 
parUnsonian-like symptoms in humans and nonhunian primates. To investigate the 
consequences of grafting adrenal medullary tissue into the brain of a rodent model of 
Parkinson's disease, a piece of adult mouse adrenal meduUa was grafted unilaterally 
into mouse striatum 1 week after MPTP treatment. This MPTP treatment resulted in 
the virtual disappearance of tyrosine hydroxylase-immunoreactive fibers and severely 
depleted striatal dopamine levels. At 2, 4, and 6 weeks after grafting, dense tyrosine 
hydroxylasc-immimorcactive fibers were observed in the grafted striatum, while only 
sparse fibers were seen in the contralateral striatum. In all cases, tyrosine hydroxylase- 
immunorcactivc fibers appeared to be ftx>m the host rather than from the grafts, which 
survived poorly. These observations suggest that, in mice, adrenal medullary grafts 
exert a neurotrophic acdon in the host brain to enhance recovery of dopaminergic 
neurons. This cflFect may be relevant to the symptomatic recovery in Parkinson^s 
disease patients who have icccivcd adrenal medullary grafts. 



IN HUMANS, THE DRUG, I-METHYW 
phenyl-1 ,2,5,6-tetrahydropyridinc 
(MFTP), produces motor deficits that 
closely resemble those observed in Parkin- 
son's disease (7—4). This observation has led 
to the development of animal models of 
Parkinson's disease that arc valuable for 
studying the effects of brain grafting (5). 
MFTP damages the dopamine (DA) -con- 
taining A9 cell group in the pars compacta 
of the substantia nigra and results in a 
degeneration of the nigrostriatal DA fibers 
and loss of striatal DA and its metabolites 
{1-8). The severity of this damage is spccics- 
dcpendcnt. In primates, MPTP treatment 
damages both the DA fibers and cell bodies 
(1-5). In mice, the fibers arc damaged, but 
many A9 neurons survive ((5, 7), Because the 
MPTP lesion is transient in mouse (7, P), 
the MPTP-trcatcd mouse provides an op- 
portunity for studying recovery of identified 
neurons in the brain. Our study suggests 



that striatal grafts of adult mouse adrenal 
medulla enhance recovery of these neurons. 

Two MPTP treatments were compared for 
dieir effects on striatal DA levels and tyrosine 
hydroxylasc-immunoreactivity (TH-IR) in 
the striatum and A9 region of C57BL/6 mice 
(6 to 12 weeks old; 21 to 28 g). As described 
{6, 7), lightly etherized mice received multiple 
injections of MPTP-HQ subcutancously in 
0.5 n\l of saline. Group A received three 
injections of 30 mg per kilogram of body 
weight at 24- hour intervals and group B 
rccrivcd two injections of 50 mg per kilogram 
of body weight 16 hours apart. Catechol- 
amines in tissues were isolated and measured 



M. C. Bohn and L. Cupit. Department of Neurobiology 
and Behavior, State University of New York, Stony 
Brook, NY 11794. 

F. Maiciano and D. M. Gash, Department of Neuiobwl- 
ogy and Anatomy, Univcnity of Rochester Scboc^ of 
Medicine. Rochester, NY 14642. 



*To whom correspondence should be addressed. 



21 AUGUST 1987 



REPORTS 91} 



Eur J. Biochem. 267. 6931-6937 (2000) © FEBS 2000 



Localization of the mosaic transmembrane serine protease corin to 
heart myocytes 

John D. Hooper^, Anthony L Scarman^, Belinda E. Clarke^ John F. Normyle^ and Toni M. Antalis^ 

^Cellular Oncology Laboratory, Queensland Institute of Medical Research, Brisbane, Queensland, Australia; 
Department of Anatomical Pathology. The Prince Charles Hospital, Chermside, Queensland. Australia 



Conn cDNA encodes an unusual mosaic type II transmembrane serine protease, which possesses, in addition to a 
trypsin-like serine protease domain, two frizzled domains, eight low-density lipoprotein (LDL) receptor domains, 
a scavenger receptor domain, as well as an intracellular cytoplasmic domain. In in vitro experiments, recombinant 
human corin has recently been shown to activate pro-atrial natriuretic peptide (ANP), a cardiac hormone essential 
for the regulation of blood pressure. Here we report the first characterization of corin protein expression in heart 
tissue. We generated antibodies to two different peptides derived from unique regions of the corin polypeptide, 
which detected immunoreactive corin protein of approximately 125-135 kDa in lysates from human heart 
tissues. Immunostaining of sections of human heart showed corin expression was specifically localized to the 
cross striations of cardiac myocytes, with a pattern of expression consistent with an integral membrane 
localization. Corin was not detected in sections of skeletal or smooth muscle. Corin has been suggested to be a 
candidate gene for the rare congenital heart disease, total anomalous pulmonary venous return (TAPVR) as the 
corin gene colocalizes to the TAPVR locus on human chromosome 4. However examination of corin protein 
expression in TAPVR heart tissue did not show evidence of abnormal corin expression. The demonstrated corin 
protein expression by heart myocytes supports its proposed role as the pro- ANP convertase, and thus a potentially 
critical mediator of major cardiovascular diseases including hypertension and congestive heart failure. 

Keywords: serine protease; corin; heart; pro-atrial natriuretic peptide (pro- ANP); TAPVR. 



Serine proteases are found in all living organisms, ranging from 
viruses to humans [1], where they serve important and varied 
biological functions in situations requiring limited proteolysis. 
Their activities impact on areas as diverse as hemostasis, tissue 
remodelling and wound repair, inflammation, angiogenesis, 
fibrinogenesis and fibrinolysis. Cell surface serine proteases 
have been associated largely with extracellular matrix degra- 
dation, but there are emerging roles for these proteases in 
generating bioactive matrix protein fragments, influencing the 
release, the activation and bioavailability of growth factors and 
in shedding of cell surface proteins [2—6], 

Many serine proteases are mosaic proteins comprising 
multiple, structurally distinct domains necessary for regulating 
enzymatic activity. Circulating serine proteases of the blood 
coagulation (e.g. prothrombin and factor X) [7], fibrinolysis 
(e.g. plasminogen activators) [8] and complement (e.g. Clr and 
Cls) [9] systems are well characterized examples of mosaic 
proteins. While the vast majority of known serine proteases are 
secreted, more recently some serine proteases have been found 
to possess integral transmembrane domains. The proteins 
enteropeptidase [10], hepsin [11] and most recently, TMPRSS2 

Correspondence to T. M, Antalis, Queensland Institute of Medical 
Research, Post Office Royal Brisbane Hospital, Brisbane, 4029, 
Queensland, Australia. Fax: + 61 73362 0107, Tel.: + 61 73362 0312, 
E-mail: tomA@qinru'.edu.au 

Abbreviations: LDL, low-density lipoprotein; ANP, atrial natriuretic 
peptide; TAPVR, total anomalous pulmonary venous return; tPA, 
tissue-type plasminogen activator; uPA, urokinase-type plasminogen 
activator; ang, angiotensin; ACE, angiotensin converting enzyme. 
(Received 24 July 2000, revised 12 September 2000, accepted 
4 October 2000) 



[12] are examples of mosaic serine proteases with type II 
transmembrane domains. These enzymes are positioned on the 
plasma membrane via a membrane spanning domain close to 
the N-terminus, In addition to membrane spanning and protease 
domains, enteropeptidase also contains two low-density lipo- 
protein (LDL) receptor domains, a meprin-like domain, two 
Clr-like domains and a truncated scavenger receptor domain. 
An LDL receptor domain and a scavenger receptor domain 
have also been identified in TMPRSS2 [12]. The functions of 
these domains have not been determined. 

Serine proteases play important roles in several aspects of 
heart physiology and cardiovascular disease [13]. The mast cell 
serine protease chymase is believed to be the major converter of 
angiotensin (ang)I to angll in human heart tissue [14]. The 
involvement of angll in normal cardiac function as well as in 
heart ailments such as hypertrophy, heart failure and ischaemic 
heart disease is indicated by the finding that inhibition of the 
angiotensin converting enzyme (ACE), leads to beneficial 
outcomes for sufferers of these diseases [15]. However, ACE 
inhibitors block only 10-20% of angi conversion in heart tissue 
whereas the remaining activity is blocked by serine protease 
inhibitors [16]. The fibrinolytic serine proteases tissue-type 
plasminogen activator (tPA) and urokinase-type plasminogen 
activator (uPA) are also thought to be involved in the 
progression of heart disease. uPA is present at significantly 
elevated levels in the atherosclerotic lesions responsible for 
myocardial infarction and failure [17]. The reduction in tPA 
from arteriolar smooth muscle cells is linked to the develop- 
ment of coronary artery disease in transplanted hearts [18]. 

Our own work and that of Yan et al. [19] has led to the recent 
cloning of a cDNA encoding a novel, multidomain type II 
transmembrane serine protease from human heart. The 



6932 J. D. Hooper et al. (Eur. J. Biochem. 267) 



© FEBS 2000 



predicted protein, corin, comprises two frizzled domains, eight 
LDL receptor domains, a truncated scavenger receptor domain, 
in addition to the extracellular trypsin-like serine protease 
domain [19]. Recent expression of recombinant corin demon- 
strates that it possesses pro-atrial naturitic peptide (ANP) 
convertase activity [20], and thus may play a critical role in the 
regulation of hypertension. In situ hybridization studies of 
mouse embryonic heart showed that corin mRNA was 
expressed as early as day 9.5 and maintained its expression 
through the adult animal [19]. The corin gene was mapped to 
human chromosome 4pl2-13 [19], near the locus for the 
congenital heart disease, total anomalous pulmonary venous 
return (TAPVR). Here we present data describing for the first 
lime native corin protein expression and localization in human 
heart- 

MATERIALS AND METHODS 

Identification of corin cDNA by homology cloning 

Homology cloning was performed by RT-PCR using degenerate 
oligonucleotides corresponding to conserved regions of serine 
proteases [21-24]. Total RNA was isolated from Sla cells [25] 
following treatment with TNFa and cycloheximide for 4 h. 
RNA (5 ^jig) was reverse transcribed at 42 ^'C using AMV 
reverse transcriptase (Promega, Madison,WI) in the presence of 
oligo dTi2-i8 (0-25 |xg |xL"*) (Pharmacia Biotech, Sweden), 
50 mM Tris/HCl, pH 8.3, 50 mM KCl, 10 mM MgCh, 10 mM 
dithiothreitol and 0.5 mM spermidine in a total volume of 
20 |xL. PGR was performed using 1 |xL of the reverse 
transcriptase reaction mixture, 500 ng of each primer, 10 mM 
Tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgClj. 0.2 mM dNTPs 
and l-2units of Taq polymerase (Perkin Elmer). The primers 
were as follows. Forward, 5'-ACAGAATTCTGGGTIGTIACI- 
GCIGCICAYTG-3'; reverse, 5 '-AC AGAATTC AXIGGICCI- 
CCI(C/G)(T/A)XTCICC-3'; where X = A or G, Y = C or T; 
I = inosine). 

Cycling conditions: 2 cycles of 94 °C for 2.5 min, 35 ''C for 
2.5 min and 72 ''C for 3 min, followed by 33 cycles of 94 °C 
for 2.5 min, 57 °C for 2.5 min and 72 °C for 3 min, with a final 
extension at 72 ^C for 7 min. PCR products of approximately 
450 bp were ligated into pGEM-T (Promega, Madison, WI, 
USA), cloned and analysed by DNA sequencing. A DNA 
fragment was identified which represented the partial corin 
sequence (nucleotides 334-748). The cDNA was extended 333 
nucleotides towards the 5' end by screening a cDNA library 
using two rounds of PCR and the nested oligonucleotides 
ATC2P3 and ATC2P1 in combination with the vector specific 
primer T7. The 3' end was extended to nucleotide 976 by two 
rounds of PCR and the nested oligonucleotides ATC2P4 and 
ATC2P5 in combination with the vector specific primer T3. The 
primer sequences are given below. 

ATC2P1: 5'-GCGTGTCTGCATGAACACTG-3'; ATC2P2: 
5'-ATGCCAAGCACCACTTTCCA-3'; ATC2P3: 5'-ATAGTC- 
CACCACTGCTCGAC-3'; ATC2P4: 5'-TTAAGCTGCAAGA- 
GGGAGAG-3'. 

The DNA sequence of this cDNA has been deposited in 
the DDBJ/Genbank/EMBL database under accession no. 
API 13248. 

Heart tissue specimens 

Tissues from explanted hearts with terminal heart failure were 
either snap frozen in liquid nitrogen (for RNA and protein 
analyses) or processed for routine histological examination. Six 



paraffin embedded blocks of human heart tissue were obtained 
from autopsy cases with acute myocardial infarction. These 
blocks included both viable and nonviable myocardium. 
Procedures were in accordance with guidelines established by 
the National Health and Medical Research Council of Australia, 
Ethics Approval number EC9876(n). 

Northern and Poly(A)^ RNA dot blot analyses 

Human multiple tissue northern blots (Clontech, Palo Alto, CA, 
USA) contained 2 |xg of poly(A)"*^ RNA per lane. The blots 
were hybridized with a "P-dCTP labeled EcoKl digested DNA 
fragment encoding corin cDNA in ExpressHyb (Clontech) 
solution at 65 "C and washed to a final stringency of 
0.2 X NaCl/Cit. 0.1% SDS at 65 '^C. The blot was reprobed 
with p-actin as a measure of loading in each lane. For the 
mouse tissue blot, total RNA was purified from mouse tissues, 
separated by denaturing gel electrophoresis and transferred to 
Hybond-N nylon membranes as described [26]. The blot was 
hybridized with the radiolabelled human corin DNA probe 
under lower stringency conditions in ExpressHyb solution at 
55 °C and washed to a final stringency of 1 x NaCl/Cit, 0.1% 
SDS at 55 °C. The mouse tissue blot was stained with ethidium 
bromide to confirm RNA loading in each lane. 

Production of affinity purified antlpeptide polyclonal 
antibodies 

Rabbit polyclonal antibodies were generated against corin 
specific peptides derived from nonhomologous hydrophilic 
regions within the corin amino-acid sequence. Two peptides, 
each containing a cysteine residue incorporated at the C-terminus, 
were synthesized (Auspep, Parkville, Australia) and conjugated 
to keyhole limpet hemocyanin using |x-maleimidobenzoic acid 
N-hydroxysuccinimide ester. The peptides were: Al: IQEQE- 
KEPRWLTLHSNWE-C, A2: GHMGNKMPFKLQEGE-C. 
Rabbit antisera was peptide-affinity purified using SulfoLink 
coupling gel (Pierce, Rockville, IL). The specificity of each 
antibody was tested against the immunogenic peptide by 
ELISA. 

Western blot analysis 

Frozen heart tissue (100 mg) was homogenized in lysis-binding 
buffer (Dynabeads mRNA Direct kit, Dynal) and spun at 
13000xg for 2 min. The protein pellet was dissolved in 
reducing SDS-sample buffer for Western blot analysis. Proteins 
were separated by SDS/PAGE on 10% acrylamide gels and 
transferred electrophoretically to Hybond-P membranes 
(Amersham, Aylesbury, UK). Membranes were blocked with 
5% nonfat skim milk powder in Tris/NaCl (10 mM Tris/HCl, 
pH 7.0, 150 mM NaCl), incubated with affinity purified anti- 
peptide antibody, then with horseradish peroxidase conjugated 
sheep anti-(rabbit Ig) secondary antibody, and visualized by 
enhanced chemiluminescence (Amersham, Aylesbury, UK). 

Immunohistochemlstry 

Paraffin sections (5 jxm) of formalin-fixed human heart were 
deparaffinized, then rehydrated before antigen retrieval in 
boiling 10 mM citric acid buffer, pH 6. After cooling, 
endogenous peroxidase activity was inhibited by lOmin 
incubation in 1% hydrogen peroxide. Non-specific antibody 
binding was blocked by incubating the sections in 4% nonfat 
skim milk powder in NaCl/Pj for 15 min, followed by 10% 



© FEBS 2000 



Conn is expressed by human myocytes {Eur. J, Biochem. 267) 6933 



Fig. 1. Corin expression in human and 
mouse tissues. (A) Northern blot analysis of 
RNA isolated from a range of normal human 
tissues probed with ^^P-labelled corin cDNA. 
The levels of p-actin mRNA are shown as a 
control for loading. (B) Northern blot analysis 
of corin mRNA expression in a range of mouse 
tissues probed with ^^P-labelled human corin 
cDNA at reduced stringency. The levels of 
1 8S ribosomal RNA are shown as a control 
for loading. 



^^^rxj^*?*? ^ 4^ <^ 4- $ 



J' 



7.5 



4.4 



(^-actln 




- Human Corin 



;^ ^ ^ J 



7.5- 
4.4 - 
18S rRNA- 





-Mouse Corin 




normal goat serum for 20min. Affinity purified anticorin Al 
(1 : 100; 150 |jLg-mL~') or A2 antibodies (1 : 50; 
20 (xg-mL~*) were applied and incubated overnight in a 
humidified chamber at room temperature. Controls included 
sections incubated with no primary antibody or antibody that 
had been preadsorbed for 2 h at room temperature with 1 pig of 
the antigenic peptide. Following incubation with prediluted 
biotinylated goat anti-(rabbit Ig) Ig (Zymed, San Francisco, 
CA, USA), streptavidin-horseradish peroxidase (Zymed) was 
applied and color developed using the chromogen 3,3'-diamino- 
benzidine with hydrogen peroxide as substrate. The sections 
were counterstained in Mayers* haematoxylin. 



RESULTS AND DISCUSSION 

Isolation of human corin cDNA by homology cloning 

A PCR-based homology cloning approach was employed to 
identify serine protease cDNAs expressed by the Sla cell line 
[25] which is resistant to tumor necrosis factor-a induced 
apoptosis. Degenerate primers designed to anneal to cDNA 
encoding the conserved regions surrounding the catalytic 
histidine and serine amino acids of serine proteases [21-23], 
were used to amplify and then clone a range of DNA fragments 
of approximately 450 bp. One clone, designated ATC2, was 
found to encode a novel serine protease. The cDNA was 
extended in the 5' and 3' directions by library screening and the 
DNA sequence was deposited in the DDBJ/Genbank/EMBL 
database (accession no. AFl 13248). This sequence was 
subsequently determined to be 100% identical to a recently 
reported cDNA encoding the serine protease, corin (accession 
no. AFl 33845) [19]. 



Corin mRNA is strongly expressed in heart 

The tissue distribution of corin mRNA was examined by 
Northern blot analyses. Analysis of poly (A) RNA from 16 



normal human tissues showed a single transcript of approxi- 
mately 5.1kb detectable only in human heart (Fig, lA). 
Examination of a range of mouse tissues also demonstrated 
specific expression of corin mRNA of approximately 5,lkb 
only in mouse heart (Fig. IB). 



Corin - 




Fig. 2. Corin protein expression in human heart tissue by Western blot 
analysis. Immunoreactive corin protein of 125-135 kDa is detected in a 
protein lysate prepared from human heart tissue (Patient #7684), which is 
not detectable in a corin negative HeLa cell lysate. The blot was probed 
with anticorin antibody, AbAl, and visualized using enhanced chemilumi- 
nescence. The protein standards in kDa are as indicated. 



m 

6934 J. D. Hooper et al {Eun J. Biochem. 267) 



© FEBS 2000 








r4 

■ 1 



B 




- »w ^ ■» r 



1:' m. *rf 



>i ."* 




Fig 3 Conn is localized to human heart myocytes by immunostaining. Immunohis.ochemical staining of human hear. ..ssues was performed usmg .he 
Iffiniw punned an.icorin peptide Al or A2 polyclonal an.ibodies as primary an.ibodies. (A) a longitudinal section of a represen.a.tve hear, "-e from a 
"ranslnt recipient (Pa.ien. #7684) stained wi.h AbAl showing in.ense s.aining in .he cardiac myocy.es; (B) as (A) excep. the pr-ma^r am.body wa 
pTad old Jth th immunogenic peptide. Al. for 2 h; (C) the san,e tissue as (A) excep. stained with the weaker s.am.ng an.tbody. AbA2. Apparem 
£ g m .he poles of ,he nuclei are deposits of .he brown lipoehrome pigment, lipofuscin. (D) the same tissue as (A-C) processed .n ,he o pr.mary 

an.ibody (E) a longitudinal section of normal myocardium from a hear, which contained an acute infarc. elsewhere (Pa..em #A4-99R) sta ned w.th AbA 
shoS imense staining corresponding to .he cross s.ria.ions; (F) s.aining of a,e same hear, .issue as (E) wi.h AbAl showmg tn.ense s.am,ng .n cross 
section. Photomicrographs (A-E) were taken at an original magnification of lOOx. 



Anti-corin antibodies detect corin in heart lysates 

We generated polyclonal antibodies to two different peptides 
derived from unique regions of the corin polypeptide 
sequence in order to investigate its expression and localization 
in the heart. The first was a unique region within the serine 
protease catalytic domain between the conserved Asp and Ser 



amino-acid residues (AbAl) and the second was contained 
within the scavenger receptor domain (AbA2). Immunoblot 
analysis of corin protein expression in human heart protein 
lysates showed a major immunoreactive band of 125-135 kDa 
(Fig. 2), which was not present in lysates from the negative 
control HeLa cell line. This molecular mass is slightly lower 
than that reported 150 kDa) for recombinant V5/His6 



© FEBS 2000 Conn is expressed by human nnyocyies (Eun J, Biochem, 267) 6935 




Fig. 4. Corin expression in neonate heart with TAPVR. Immunohistochemical staining of human neonate heart tissues was performed using the affinity 
purified anticorin peptide A I polyclonal antibody as the primary antibody (A) and (C) longitudinal sections of TAPVR heart tissue showing staining in the 
cardiac myocytes, corresponding to the cross strialions; (B) and (D) longitudinal sections of a normal neonate heart showing a similar staining pattern in the 
cardiac myocytes. Photomicrographs (A) and (B) were taken at an original magnification of lOOx and (C) and (D) were taken at an original magnification of 
40x. 



tagged corin expressed by human embryonic kidney 293 cells 
[20]. As the mature corin zymogen has a calculated mass of 
116 kDa [19], it is likely that the mature corin polypeptide 
undergoes a post-translational processing event, possibly 
glycosylation. Consistent with this, there are 19 predicted 
N-linked glycosylation sites present in the extracellular 
domains of corin [19]. 



Corin is expressed by human heart myocytes 

To investigate the localization of corin expression in human 
heart, immunohistochemical analyses were performed on 
human adult heart tissues. Corin was abundantly expressed 
in cardiac myocytes, with intense brown staining associated 
with cross striations seen in longitudinally sectioned myofibers 
(Fig. 3A). In some areas there was accentuation of the plasma 
membrane, consistent with an integral membrane localization 
of corin. This same pattern of staining was observed in sections 
taken from all areas of the myocardium. Control slides using 
the AbAl polyclonal antibody in the presence of competing 
Al peptide showed absence of this specific staining pattern 
(Fig, 3B). An identical, albeit weaker staining pattern was 
observed in experiments performed using the second corin- 
specific antibody (AbA2) (Fig. 3C). No staining was detected 
in the absence of antibody (Fig. 3D). Staining of a section of 



viable myocardium from a heart containing an acute myocar- 
dial infarct showed a similar intense staining of the striations 
in cardiac myocytes (Fig. 3E) and a pinhead-like dot pattern 
when viewed in cross section (Fig. 3F). Necrotic heart tissue 
showed similar but much less intense staining (data not shown). 
Corin was not detected in sections of skeletal or smooth muscle 
(data not shown), suggesting that the function of corin is 
specifically related to cardiac muscle. 



Corin protein expression in a patient with the congenital 
heart disease, TAPVR 

The molecular mechanisms responsible for the developmental 
defect associated with the rare congenital heart disease TAPVR 
are not known. The location of the corin gene on human 
chromosome 4pl2-13 [19] and the localization of the TAPVR 
locus to a 30 centimorgan interval on 4pl3-ql2 [26], suggested 
that corin may be a candidate for the TAPVR gene [19]. If corin 
plays a role in TAPVR, its expression may be lost or altered in 
TAPVR heart tissue. To explore this possibility, we examined 
corin protein expression in a TAPVR heart. The pattern of corin 
expression detected in this heart tissue (Fig. 4A,C) was similar 
to that observed in the adult heart and was identical to the 
pattern of corin staining in an age-matched neonate control 
heart (Fig. 4B,D). While this data is not consistent with a role 



0^ 



6936 J. D. Hooper et al. {Eur. J. Biochem. 267) 



© FEBS 2000 



Transmembrano Serine Proteinase 

Domain Truncated Scavenger Domain 

I LOL-receptor-IIke Frizzled Receptor Cystelne-Rich 



Corin ".N-|ie» 



Domains Domain 



Domain 



Meprtn-llke Cir-like 
Domain Domain 



Enteropeptidase H.N-mi-^^ 



* — • ■ ^ I • I. r ■ t , 



TIVIPRSS2 



Hepsin HN 



8- 





8- 



CO) (S 



COM 



^ ^ 



^ (g) (l> 



— COH 



s 



^ <?> ^ 



-COH 



Fig. 5, Diagram showing domain structures of corin compared with other mosaic integral membrane proteins. The domains are as indicated. The 
catalytic serine protease residues are circled. The disulfide bond linking catalytic and pro-regions are marlced. 



for corin in TAPVR, it does not exclude the possibility that 
TAPVR is associated with more subtle alterations to the corin 
gene; for example point mutations, that would not be detected 
by this method. 

Corin homology to other type II transmembrane proteases 

As illustrated in Fig. 5, corin is a mosaic integral membrane 
protein possessing discrete domains. The intracellular, cyto- 
plasmic domain contains two potential protein kinase C phos- 
phorylation sites which may represent mechanisms for signal 
relay to or from the cell surface. Corin contains two frizzled 
domains. These domains function in other molecules as 
receptors for Wnt proteins, which are implicated in signal 
transduction during development [28]. Corin possesses eight 
LDL receptor domains which can mediate uptake of LDLs [29] 
and have also been shown to be involved in binding and 
internalization of protease/inhibitor complexes [30]. LDLs 
regulate the transport of cholesterol and play a major role in 
the development of heart disease. Corin possesses a scavenger 
receptor domain, which in other proteins, binds polyanionic 
molecules including modified lipoproteins, cell surface lipids 
and some sulfated polysaccharides [31]. The trypsin-like serine 
protease domain is located at the C-terminus. 

Corin bears similarity to other known members of the 
integral membrane serine proteases as illustrated in Fig. 5. The 
corin serine protease domain is highly homologous to a 
multidomain integral-membrane serine protease found in the 
brush border of the intestine, enteropeptidase [32]. Entero- 
peptidase functions to activate digestive pancreatic enzymes 
released from the intestine. Activation of this cascade is critical, 
as illustrated by the life-threatening intestinal malabsorption 
that accompanies congenital deficiency of enteropeptidase [32]. 
Other proteases with homology to the corin serine protease 
domain are the integral-membrane serine proteases, TMPRSS2 
and hepsin. Hepsin is a hepatic serine protease that has been 
demonstrated to activate Factor VII in the extrinsic blood 
coagulation pathway leading to thrombin formation, and has 
further been shown to be required for mammalian cell growth 
[33]. 

In summary, we have confirmed heart as a site of abundant 
corin mRNA expression and demonstrated for the first time the 
expression of corin as a 125-135 kDa protein in this tissue. In 



addition, in heart we have localized corin protein to myocytes; 
the same cardiac cells expressing pro-ANP. These data support 
recently reported in vitro evidence that the corin proteolytic 
domain is the pro-ANP convertase [20] and thus, the proposal 
that corin has a role in regulating blood pressure. Possible 
additional functions of the serine protease domain and the 
functions of the other corin domains are not yet known. The 
putative phosphorylation sites in the cytoplasmic domain of 
corin may indicate that the intracellular domain of corin will be 
a target for phosphorylation and therefore may mediate 
signalling events from the cell surface. A better understanding 
of the role of corin in heart will provide insight into basic 
molecular mechanisms of cardiac function and could provide a 
rational target for both diagnostic and therapeutic applications. 

ACKNOWLEDGEMENTS 

This work was supported by grants from the Queensland Cancer Fund, 
Brisbane, Australia and the National Health and Medical Research Council 
of Australia. J. D. H. was supported by a John Eamshaw Scholarship from 
the Queensland Cancer Fund and by the Bancroft Scholarship, Queensland 
Institute of Medical Research. 

REFERENCES 

1. Rawlings, N.D. & Barren, AJ. (1994) Families of serine peptidases. 

Methods Enzymol. 244, 19-61. 

2. Murphy, G. & Gavrilovic, J. (1999) Proteolysis and cell migration: 

creating a path? Curr. Opin. Cell Biol. 11, 614-621. 

3. LeMosy, E.K., Hong, C.C. & Hashimoto, C. (1999) Signal transduction 

by a protease cascade. Trends Cell Biol. 9, 102-107. 

4. Rifkin, D.B., Mazzieri, R., Munger, J.S., Noguera. 1. & Sung. J. (1999) 

Proteolytic control of growth factor availability. Acta Path. Microbiol. 
Immunol Scand, 107, 80-85. 

5. Dery, O. & Bunnett, N.W. (1999) Proteinase-activated receptors: a 

growing family of heptahelical receptors for thrombin, and tryptase. 
Biochem. Soc, Trans. 27, 246-254. 

6. Noel, A., Gilles, C, Bajou, K., Devy, L., Kebers, F, Lcwalle, J.M., 

Maquoi. E., Munaut. C, Remade. A. & d Foidart, J.M. (1997) 
Emerging roles for proteinases in cancer. Invasion Metastasis 17, 
221-239. 

7. Ichinose, A. & Davie, E.W. (1994) The Blood Coagulation Factors: 

Their cDNAs. Genes, and Expression. In Hemostasis and 
Thrombosis: Basic Principles and Clinical Practice (Colman, R.W., 



© FEBS 2000 



Conn is expressed by human mydcyicsmEur. J. Biochem. 267) 6937 



Hirsh, J., Marder V.J. & Salzman, E.W., eds), pp. 19-54. J.B. 
Lippincott Company, Philadelphia, PA, USA. 

8. Francis, C.W. & Marder, V.J. (1994) Physiologic Regulation and 

Pathologic Disorders of Fibrinolysis. In Hemostasis and Thrombosis: 
Basic Principles and Clinical Practice (Colman, R.W., Hirsh, J.. 
Marder VJ. & Salzman, E.W., eds), pp. 1076-1103. J.B. Lippincott 
Company, Philadelphia, PA, USA. 

9. Arlaud, G.J. & Thielens, N.M. (1993) Human complement serine 

proteases CI r and Cls and their proenzymes. Methods EnzymoL 223, 
61-82. 

10. Kitamoto, Y, Veile, R.A., Donis- Keller, H. & Sadler, J.E. (1995) 

Human complement serine proteases Clr and Cls and their 
proenzymes. Biochemistry 34, 4562-4568. 

11. Tsuji, A., Torres-Rosado, A., Arai, T, Le Beau, M.M., Lemons. R.S.. 

Chou, S.H. & Kurachi, K. (1991) Hepsin, a cell membrane-associated 
protease. Characterization, tissue distribution, and gene localization, 
i. BioL Chenu 266. 16948-16953. 

12. Paoloni-Giacobino, A., Chen, H., Peitsch, M.C., Rossier, C. & 

Antonarakis, S.E. (1997) Cloning of the TMPRSS2 gene, which 
encodes a novel serine protease with transmembrane, LDLRA, and 
SRCR domains and maps to 21q22.3. Genomics 44, 309-320. 

13. Schussheim, A.E. & Fuster, V. (1997) Thrombosis, antithrombotic 

agents, and the antithrombotic approach in cardiac disease. Prog. 
Cardiovascular Diseases 40, 205—238. 

14. Balcells, E., Meng, Q.C., Johnson, W.H. Jr. Oparil, S. & Dell'Italia, 

L.J. (1997) Angiotensin II formation from ACE and chymase in 
human and animal hearts: methods and species considerations. Am. 7. 
Physiol. Ill, H1769-H1774. 

15. Wolny, A., Clozel, J.R, Rein, J., Mory, P, Vogt, P., Turino, M., 

Kiowski, W. & Fischli, W. (1997) Functional and biochemical 
analysis of angiotensin ii-forming pathways in the human heart. 
Circ. Res, 80, 219-227. 

16. Bumpus, F.M. (1991) Angiotensin I and II. Some early observations 

made at the Cleveland Clinic Foundation and recent discoveries 
relative to angiotensin II Formation in human heart. Hypertension 18, 
122-125. 

17. Kienast, J., Padro, T., Steins, M., Li, C.X., Schmid, K.W., Hammel, D., 

Scheld, H.H. & Van De Loo, J.C. (1998) Relation of urokinase- type 
plasminogen activator expression to presence and severity of 
atherosclerotic lesions in human coronary arteries. Thromb. Haemost. 
79, 579-586. 

18. Labarrere. C.A., Pitts, D., Nelson, D.R. & Faulk, W.P. (1995) Vascular 

tissue plasminogen activator and the development of coronary 
artery disease in heart-transplant recipients. N, Engl. J. Med. 333, 
1111-1116. 

19. Yan, W., Sheng, N.. Seto. M., Morser, J. & Wu, Q. (1999) Corin, a 

mosaic tranmembrane serine protease encoded by a novel cDNA 
from human heart. J. Biol. Chem. 274, 14926-14935. 

20. Yan, W., Wu, F., Morser, J. & Wu, Q. (2000) Corin, a transmembrane 

cardiac serine protease, acts as a pro-atrial natiuretic peptide- 
converting enzyme. Proc. Natl Acad. Sci. USA 97, 8525-8529. 



21. Sakanari, J. A., Staunton, C.E., Eakin, A.E., Craik, C.S. & McKcrrow, 

J.H. (1989) Serine proteases from nematode and protozoan parasites: 
isolation of sequence homologs using generic molecular probes. 
Proc. Natl Acad, Sci. USA 86, 4863-4867. 

22. Elvin, CM., Whan. V & Riddles, RW. (1993) A family of serine 

protease genes expressed in adult buffalo fly {Haematobia irritans 
exigud). MoL Gen, Genet. 240, 132-139. 

23. Elvin, CM., Vuocolo, T.. Smith. W.J., Eisemann, CH. & Riddles, RW. 

(1994) An estimate of the number of serine protease genes 
expressed in sheep blowfly larvae {Lucilia cuprina). Insect Mol. 
Biol. 3, 105-115. 

24. Hooper, J.D., Nicol, D.L., Dickinson, J.L., Eyre, H.J., Scarman, A.L., 

Normyle, J.F., Stuttgen, M.A., Douglas, M., Loveland, K.A.L., 
Sutherland, G.R. & Antalis. TM. (1999) Testisin, a new human serine 
proteinase expressed by premeiotic testicular germ cells and lost in 
testicular germ cell tumors. Cancer Res. 59, 3199-31205. 

25. Dickinson, J.L., Bates, E.J., Ferrante, A. & Antalis, T.M. (1995) 

Plasminogen activator inhibitor type 2 inhibits tumor necrosis factor 
alpha induced apoptosis. Evidence for an alternate biological 
function. /. Biol. Chem. 270, 27894-27904. 

26. Antalis. T.M. & Dickinson, J.L. (1992) Control of plasminogen 

activator inhibitor type 2 gene expression in the differentiation of 
monocytic cells. Eur J. Biochem. 205, 203—209. 

27. Bleyl, S., Nelson, I., Odelbury, S.J., Ruttonberg, H.D.. Otterud, B., 

Leppert, M. & Ward, K. (1995) A gene for familial total anomalous 
pulmonary venous return maps to chromosome 4pl3-ql2. Am. J. 
Hum. Genetics 56, 408-415. 

28. Cadigan, K.M. & Nusse, R. (1997) Wnt signaling: a common theme in 

animal development. Genes Dev. 11, 3286-3305. 

29. Bujo, H.. Yamamoto, T., Hayashi, K., Hermann, M., Nimpf, J. & 

Schneider, W.J. (1995) Mutant oocytic low density lipoprotein 
receptor gene family member causes atherosclerosis and female 
sterility. Proc. Natl Acad. Sci. USA 92, 9905-9909. 

30. Kounnas, M.Z., Church, EC, Argraves. W.S. & Strickland, D.K. 

(1996) Cellular internalization and degradation of antithrombin III- 
thrombin, heparin cofactor Il-thrombin, and a 1 -antitrypsin-trypsin 
complexes is mediated by the low density lipoprotein receptor-related 
protein. / Biol. Chem. 271, 6523-6529. 

31. Resnick, D., Chatlerton, J.E., Schwartz, K., Slayter, H. & Krieger, M. 

(1996) Structures of class A macrophage scavenger receptors. 
Electron microscopic study of flexible, multidomain, fibrous proteins 
and determination of the disulfide bond pattern of the scavenger 
receptor cysteine-rich domain. J. BioL Chem. 271, 26924-26930. 

32. Kitamoto, Y, Yuan. X.. Wu, Q., McCourt, D.W. & Sadler, J.E. (1994) 

Enterokinase, the initiator of intestinal digestion, is a mosaic protease 
composed of a distinctive assortment of domains. Proc. Natl Acad. 
Sci, USA 91, 7588-7592. 

33. Torres-Rosado, A., O'Shea, K.S., Tsuji, A., Chou, S.H. & Kurachi. K. 

( 1 993) Hepsin. a putative cell-surface serine protease, is required for 
mammalian cell growth. Proc. Natl Acad. Sci. USA 90, 7181-7185. 



Exhibit 1 1 





mill 



United States Patent [i9] 

Dawson et al. 



US005645833A 

[11] Patent Number: 
[45] Date of Patent: 



5,645,833 
Jul. 8, 1997 



[54] INHIBITOR RESISTANT SERINE 
PROTEASES 

[75] Inventors: Keith Martyn Dawson; Richard 

James Gilbert, both of Cowley, United 
Kingdom 

[73] Assignee: British Biotech Fharmaceoticals 

Limited, Oxford. United Kingdom 



[21] Appl. No.: 
[22] PCr FHed: 
[86] PCTNo.: 
§ 371 Date: 



379,621 

Aug. 3, 1993 

PCT/GB93/01632 



Feb. 3, 1995 

§ 102(e) Date: Feb. 3, 1995 

[87] per Pub. No.: WO94/03614 

PCX Pub. Date: Feb. 17, 1994 

[30] Foreign Application Priority Data 

Aug. 4, 1992 [GB] United Kingdom 9216558 

[51] Int a,* A61K 38/48; C12N 9/68; 

C12N 15/55; C12N 15/63 



[52] U.S. a 424/94.64; 435/217; 435/2523; 

435/320.1; 435/325; 435/358; 435/365; 
435/367; 435/369; 435/357; 435/352; 435/356; 

536/23.2 

[58] Field of Search 424/94.64; 435/217, 

435/172.3. 240.2, 2523, 320,1; 536/23.2 

[56] References Cited 

FOREIGN PATENT DOCUMENTS 



0381331 
WO9010649 
WO9109118 
WO9206203 



8/1990 European PaL Off. 

9/1990 WIPO . 

6/1991 WIPO . 

4/1992 WIPO . 



Primary Examiner — ^Dian Q Jacobson 
Attorney, Agent, or Firm — Hale And Dorr 

[57] ABSTRACT 

Serine proteases of the chymotrypsin superfamHy are modi- 
fied so that they exhibit resistance to serine protease inhibi- 
tors. If such modified serine proteases have fibrinolytic, 
thrombolytic, antithrombotic or prothrombotic properties, 
they are useful in the treatment of blood clotting diseases or 
conditions. 

22 Claims, 18 Drawmg Sheets 



U.S. Patent 



Jul. 8, 1997 



Sbeet 1 of 18 



5,645,833 




U.S. Patent jui. 8, 1997 sheet 2 of 18 5,645,833 




U.S. Patent jui. 8, 1997 sheet 3 of 18 5,645,833 




U.S. Patent 



Jul. 8, 1997 



Sheet 4 of 18 



5,645,833 




U.S. Patent jui. 8, 1997 sheet 5 of 18 5,645,833 




U.S. Patent jui. 8, 1997 sheet 6 of 18 5,645,833 




U.S. Patent 



Jul. 8, 1997 



Sheet 7 of 18 



5,645,833 



o 
o 



o 

Q 



tn 

> 

•J) 



> 



> 



u 
o 

< 



2 
U 

o 



KL 


cc 


s 


as 




8 


to 


CO 


as 






SR 


SR 


u 


(J 


u 


u 




U 


o 




U 


u 


O 


u 


o 


CO 


H 


a 




> 


CO 






> 


> 


> 


ax 


re 




< 


o 

H 






PC 


tn 








a 




u 


PC 






a: 




tn 




2: 




:z: 




>^ 


Q 


o 








in 




in 


M 




to 


o 


Q 


> 


> 




> 


> 


n 






M 


M 


M 


> 


> 



to 
to 

O 
H 

>4 



>* 


z 






































> 




> 


> 


> 


> 


X 




PC 


> 


l-H 




•J 


Cm 


> 


> 




> 


> 












a* 


Oi 


04 


ex. 




CU 




ux 


PC 


C^ 




CU 


cu 


Pu 


CU 






• 


> 


> 






> 


> 


> 


M 




> 


> 


> 






> 


1^ 








• 






a: 


PC 


cu 


PC 


z 




z 


> 


ax 


O 


O 




PC 


PC 




PC 


H 


• 








> 








n 




Eh 


< 


< 




< 




< 


< 




















> 


Cm 






cu 


CU 




cu 




o 


cu 




:^ 


:^ 


O 


o 




PC 




o 




Z 


O 


:^ 




o 








a 


a 


a 




to 






































w 


(J 


> 




cc 


o 


PC 


> 


cu 


> 


> 


• 


• 


• 






> 






> 






CO 


1 




a: 


EH 






Cm 


to 


o 


PC 


Cm 


• 


• 




m 


• 


• 






< 






< 








Ch 


a» 


cu 


cu 


to 


CD 


CD 


CD 


Q 


o 




to 






PM 


PC 










PC 


O 


cu 


to 


< 


H 


< 


< 


PM 




O4 


> 


> 


CD 




o 










z 


CD 




y» 




« 


• 


O 






> 






QC 


O 


PC 










PC 




1^ 


Cm 


CU 






o 


• 


• 


• 








tn 


• 


• 














cu 




CD 


CD 


CD 


Oi 


04 


< 




< 




< 


• 


• 


CO 


< 


< 


< 




a 


to 




Cm 


Cm 


X 


CD 


CD 


CD 








CD 


• 


* 


o 


<o 


Eh 


cu 


z 


EH 




CD 


H 


H 


X 


Z 


Z . 


Z 


»— 1 


>• 






« 


• 


PC 


PC 


< 




< 




















o 


to 






• 


• 


CD 


tJ 


CD 


cu 


H 




















a 


O 


• 




• 


• 


u: 




PC 


PC 


3: 




















< 


US 






« 


• 




• 


• 




H 




• 






« 


• 


• 


« 


* 














cu 




o 


to 


CU 


to 




CU 




CD 




Eh 


H 


Eh 




.J 






b) 




a: 


Cm 


nU 


to 


>^ 


Z 


cu 


Cm 


B 


a 


a 


X 


X 


a 


.J) 




.J 




H 




Eh 


> 




a; 


.J 


CU 


a: 


O 


CH 


H 


tH 




*A 


»u 




CO 




o: 


PC 


> 


PC 


PC 


o 




Z 


>li 




ax 


CU 


CU 


Z 


PC 


PC 


PC 


CO 




CD 


CD 


CD 


CD 


O 


CD 


CD 


o 




o 


o 


CD 


CD 


^ 




CD 


CD 




:^ 


cu 


3: 


3: 


3: 


Cm 


Pu 


3: 


3: 


3: 




Cm 


>H 


3c 


3: 






3c 


3: 




O 


ux 


CD 


CD 


CD 


CD 


O 


CD 


CD 


CD 


CD 


CD 


o 


CD 


CD 


CD 


CD 


CD 


CD 


CD 


a 


o 


as 




to 


CO 


to 


to 


to 


H 


H 


H 


to 




H 


H 




Eh 


Eh 


Eh 


o 


a: 




< 






> 


> 


> 


> 


> 


M 




$ 


M 


h4 




M 


M 


> 


o 


u 


















PC 


CU 


cu 


a 


X 


Cm 


H 


X 


X 


X 






o 


O 


CD 


CD 


ID 


CD 


to 


Eh 


CD 


u 


o 






U 


U 


u 


U 


u 


E-» 


to 




O 


• 


• 


fH 


to 




CU 


• 


to 


cu 




• 




• 


• 


« 


• 




v> 


> 


EH 


• 


• 




CD 




o 


ax 


Eh 


H 


EH 


CU 


CU 




cu 


(U 


04 




o 


CD 


CD 






o 


Cm 


> 


CD 


>H 


* 


:£ 


Eh 


EH 


EH 


NC 


Eh 


tH 


X 


cu 






a: 






fH 


:^ 


Cm 


< 


CD 




Q 


CU 


PC 


PC 


CD 




CU 


z 


*^ 


PC 


CD 


04 


CD 


CD 








o 


< 


CD 


cu 


to 


< 


a 


Q 


Z 


z 


z 



PO 




c 




to 


PC 


X 


X 


h-4 




cx 


< 


< 


M 


< 






Id 


-Q 


<d 




U 


•H 




fH 


l-H 




n 


hH 






CU 


O4 


M 






-H 


n 


M 


M 


M 




V) 








M 




> 






1 


1 


X 


dX 


£ 


to 


M 


M 


M 


0 






to 






0 


U 






•g 








•H 


to 


(X 




M 








^ 


10 


♦J 




4-> 


0 


)H 


a> 


0 






U 




(d 


Q) 






Q) 


u 






r-i 


C 


c 




4-> 












0 


*J 


rH 


ac 


a» 


Q) 


CO 


ro 


B 






Q) 




to 


U 


4-) 


0 








4J 


0 


O4 






v> 


Id 


Cm 




•0 


0 


E 


.e 


Cm 


(0 


U 


u 










M 






Id 


(d 


^ 




r-i 






0 






Cm 


fO 


cu 








fO 


CX 










to 


JJ 


a 


ac 












Cm 










Cm 


0 






VI 


«> 


Id 


c: 


G 




X 


a 


CU 


















CU 






Id 


(0 


•H 


0) 


0 






B 


e 


















•H 








rH 


cq 


E 








0 


0 
























CU 


CU 












U 


u 


















0 








































a 








































< 













o 
o 



U.S. Patent Jul. 8, 1997 sheet 8 of 18 



5,645,833 



o 
o 



to 




< 






Ui 


CO 














(O 


a: 






Q 




tn 








0 






S 




u 


M 




PC 






0 


a 


a 


u 


0 


0 


0 




0 








U 


0 




0 


TC 














M 


M 












> 


M 


US 




0 






< 












0 


< 


< 




0 


Q 




a: 




a; 






DC 




























0 




Q 




PC 


Z 


VD 








VT 


VT 


0 

H 




13 




!3 


(O 
M 




a 


a 


s 


3 


M 






U.S. Patent 



Jul. 8, 1997 



Sheet 9 of 18 



5,645,833 



o 
to 

CM 





cc 




a 


0 


0 








< 


tn 






Q 




CO 


0 


H 




< 








Q 


0 




• 


• 


* 


* 


u: 


in 




UX 


Cm 


Cm 


CO 


04 




0 


-> 






rj 


•> 














\-/ 


# \ 








f % 




U 


f 1 


HI 


• 




H<l 






•> 


•> 




•*> 


















z 






•> 




• f 


















. 1 






* 


ft. 




1-:) 




1 ■* 






/-V, 


















M4 




A» 
»-M 


(M 




(m 


CI4 


DU 






w # 
















f 1 




f 1 


t n 




i 1 




f fi 


e> 


r n 










CI 


CI 


CI 


(•1 






CI 


v.* 




w 


f 1 
W 


f 1 




\J 


C3 


f 1 








tn 












f A 




vi 








#A 


VJ 




(O 


CO 






LJ 






1—4 


I—) 






(-} 
















Q 




Q 


o 






0 




iD 
















cn 












CJ 


PC 












s 

0 


0 


:d 


Ca] 




0 


a 








0 


Z 


z 


z 




0 


0 




u 




0 


0 


u 




C_> 










a 




0 


u 


H 


04 


> 






< 


< 


(0 


tn 


< 


< 


CO 


< 


< 


(0 








a 


CO 


52 


iO 






0 




Q 


0 


0 


0 


Q 


Q 


0 










CO 


(O 


(0 






< 


< 




a 




cc 




0 






33 










* 


• 


a 




0 




• 


• 


• 


• 


• 


• 


* 




yA 










• 


• 




< 


CO 




oc 


* 


• 


• 


• 




• 


• 
















• 






• 




oC 


• 


• 






• 


• 


« 














• 


• 






• 


a: 




0 




d 






cc 


PC 














t 


• 




C/3 




> 


> 


2d 






iD 




Q 


2d 


• 


Cl« 
















> 


0 








U) 


H 


CO 


0 






H 








H 




cc 


IR 


M 






H 










33 






CO 












0 




> 


0 




u 


U 














3: 


in 










• 


» 




H 


CO 


> 


















a 


PC 


U) 


< 




CO 






0 


U 


U 


z 


z 














^d 


cu 










c^ 








.J) 


1^ 


cn 


a: 










>^ 


»-* 




0 


Q 


Cm 


3: 


3: 




• 


• 




U. 


[lU 


0: 


cu 






















CO 




(J 


0 






cc 




PC 


PC 


< 






< 


< 


< 


< 




< 




< 


< 




< 




< 








a 


u 




U 


U 


u 






0 


u 


u 


0 


U 


0 






u 


H 








»— » 






u. 


Cm 












>H 


»4 


Cm 


> 


> 


Hi 


> 


> 










:ac 


2: 
















CO 










> 


> 






Z 


z 


z 


z 




z 


z 




z 


0 




H 




H 


H 


CO 














a 


z 


M 


CO 


0 




0 






CO 




id 


2d 




to 


















tn 


E-i 


en 




tA 




a 




:d 


2d 


2d 




> 






Us 




M 


M 


h-l 


> 


1-4 


> 


> 


n 




> 




> 


> 


> 


0 








> 


> 


• 


• 


Z 


• 


CC 


Cs3 




CO 




PC 






to 


CO 










>* 




• 


• 


o« 


• 


• 


• 


* 


« 




• 


♦ 


• 


* 


• 


> 








< 


• 


• 


• 


in 


• 


• 


• 


• 


• 






* 


• 


• 


• 













































Q 




















V* 


Cm 




Q 


n 




> 








t3 




< 




Cm 


C^ 


2d 


z 






P« 






PC 


(O 


2d 


PC 


CO 


cc 




< 


:d 




z 


CO 


H 


CO 


2: 








CO 


• « 


;d 


CO 


CO 


0 


> 


CO 






0 


CO 


0 














< 


CO 


• > 


PC 












0 




> 


Sd 














rsj 


CC 


> 


• > 


3: 













CO 


cc 


CO 




CJ 


Z 


in 


CO 


CO 




z 


(J 




z 






iD 




>* 




3: 






>^ 


:£ 


:s 




>-* 






>* 


Cm 


Cm 


3: 


:£ 




3: 


3: 


n 


id 


CO 




z 


z 


< 




a 




X 




< 


3: 


3: 


CO 





CM 


c 


c 


to 


PC 


X 


X 


»H 


0 


C 


< 


< 


M 


< 




c: 


<0 




(0 










1-1 


rH 




M 


IH 




«r4 


(^ 




M 




•H 




M 


M 


►H 


u 




1/) 


4J 




U 


u 




> 


C 


•i 


1 


1 


X 




£ 


c/) 


M 


M 


IH 


0 




fd 


CO 






0 


M 




-H 










•H 




a 


IH 


M 






c; 


.-I 


<0 






*J 


0 


U 










M 


Q) 




0 






<V 


u 






rH 


c 


C 


0 


*-> 


0 


+J 


M 






0 


4J 


•H 


3: 


0) 


0) 






£ 




S> 






«0 


U 


♦J 


0 








*J 


0 


C^ 














•0 




£ 


£ 


Cm 


Id 












u 








<d 


CO 


♦J 






Q) 




0 






Cm 


*0 












a 






4-> 










z: 


Q) 




*H 






Cm 










Cm 


0 






(/) 




(d 








>i 


a 


Cl« 


















a 








<d 


rH 




0 






£ 


E 
























rH 


rH 


CO 


£ 








0 


0 


















•H 






CO 


CO 






























0 




































a 












a 




























< 













o 
o 



U.S. Patent 



Jul. 8, 1997 



Sheet 10 of 18 



5,645,833 



o 

in O 
<Ni (J 

CO 
o 

(J 
o 

U 



O 
O 

o 
o 

CO 
Q 

(J 

o 

€0 



o 
o 

CO 

o 



04 

8 

CO 
Q 

to 



to 

04 

8 

CO 



o 

CO 



to 



04 


04 


04 








8 






CO 


CO 


to 


Q 


O 


o 








U 


u 


o 


H 


CO 


CO 


Q 


O 


Q 







O 




z 


Z 


o 


C-> 


O 






> 










04 


04 






CJ 








CO 


CO 


CO 


Q 


Q 


o 


a: 


O 








to 




U 


O 


to 


H 




o 




to 



DG 


z 




Z 

z 


o 


o 




o 






> 












04 


04 


a* 


04 




CJ 


o 


C5 


8 


C!) 


C3 


C3 


CO 


CO 


CO 


CO 


O 


Q 


Q 


O 


o 




a 


c:) 












(m 






CO 


13 




o 


H 


H 





8 

U 
> 

8 

CO 

o 
u 

CO 
Q 



O 

04 

CO 

o 

CO 

o 

o 
z 
o 

CO 

o 



04 
o 

CO 
Q 

CJ 

u 
< 




U.S. Patent jui. 8, 1997 sheet 11 of 18 5,645,833 



o 
o 



in 



Q 
PC 



> 
O 

o 

o» 



> 
> 
o 

in 

> 

> 

o 

Cm 



oc 



CO 

o 
u 

c 

Q) 
E 
0) 

a 

E 
O 

u 



Cm 
1^ 
5S 
n 

a: 

Em 

§ 

> 

in 
a: 

PC 

< 
CO 

o 
u 



o 
3: 
to 

O 
> 

o 

Cm 
Cm 

PC 



Cm 
PC 



U 

C 
Q> 
H 
Q> 

a 

E 
O 

o 



< 


< 




1^ 




in 


in 


in 




> 


> 


> 


> 




> 


> 


> 


> 




PC 


PC 








oc 




zn 










H 




H 




H 


Cm 


Cm 


>^ 




>* 




>^ 








Cm 


-5 


Cm 




n 




> 


Cm 








t3 


13 




13 




iD 


Oi 






>-» 






Cm 


>i 


>i 


>• 


Cm 


• 


• 


u: 












1^ 


* 


• 








o: 


e> 




PC 


H 








> 




0 


to 






PC 


PC 




H 




PC 


< 


< 


(3 


tn 




< 


< 




0 


u 




0 


0 






U 


0 


u 


C3 


tJ 


0 




0 


cq 


CD 


0 


CD 



C3 



PC 



Cm 

to 

< 
C3 



Cm 

to 
a 

8 



»-3 »-H 



10 


€-1 




> 


> 




oc 






EH 












> 


> 


> 


C5 


0 


C3 



to 
> 



a ^ s 







ca 




0 


CO 


CO 




C3 


CD 


CD 


CD 


CD 


S: 






:2 


3: 


3: 


3: 


to 


to 


to 


to 


to 


to 


to 


> 


> 


> 




> 


> 


> 






n 










C5 


C3 


C5 


CD 


CD 


CD 


CD 


< 






H 




> 






< 














> 


> 






-1 




Cm 


3: 


Cm 


Cm 




Cm 




y; 


PC 




to 


:s 


3: 




H 


• 


H 


H 


H 


H 








Q 


CD 


CD 


CD 




Q 






CO 


OC 


2C 




55 




Cm 


> 


>i 


Cm 


Cm 






OC 


CO 


£C 


to 


04 



0 


0 


z 


Z 


^C 




PC 






0 




pc: 


< 


CD 


CD 




u 


C^ 




c^ 


CD 


CD 


CD 


CD 






« 




cc 




to 


• 


CD 


CD 


CD 




3: 






3: 


to 


to 


to 


to 


> 




M 






M 


M 


> 


TG 


VG 


8 


8 



Eh 



■J 
Eh 



tH 



CD CD 



CO 



UC 

o 



10 
> 

PC 

> 

>H 
> 

CD 

Z 
Oi 

PC 

< 
o 

tD 



CD 
3: 

to 

> 



o 



lO 

> 

H 

>H 

> 
CD 

O 

u 

CD 



CD 

3: 

to 



CD 
U 

PC 

:s 

PC 

04 

PC 
10 



10 
> 
PC 
Eh 

Cm 
> 



Cm 

Z 

u 

CD 

Cm 
< 

to 

> 
Cm 
CO 
Eh 

> 
CD 

> 
O 
3: 
CD 
CD 

a 



c 

•H 
V) 



S3 



Id 

O 
Q) 



to PC 

u c^ 



X M 



c; 
<L» 
E 
0) 

cx 

E 

o 
o 



4-> 

c 

0) 
E 
Q> 

f-H 

a 

E 
O 

u 



M 
O 
M 
U 
rQ 

Cm 



U 
O 

u 

o 

Cm 



U 
O 
U 
U 

Cm 



O 

C 
■H 

o 

M 
CXi 



c 

E 
o 



O 



to 
> 

PC 
H 
Cm 

> 

EH 

PC 
PC 

fH 

z 

C-> 
CD 

Cm 
< 

in 
> 

Cm 

to 

> 
CD 

> 

o 

3: 

CD 
CD 
Q 



to 

> 
PC 
Eh 

Cm 
> 

to 

§^ 

z 
o 

CD 

PC 

to 

CD 
Cm 

to 

> 
CD 



> 
O 
3: 

CD 
Q 



• 

0 


• 


* 


• 
* 


■ 
• 


IS 


• 

(O 


CO 


• 

CO 


< 


< 


M 


< 




c 


fO 


Xi 


fd 


CM 


O4 


M 




-H 


-H 




hH 


hH 


1 


1 


X 




E 


CO 




hH 


M 


a 








vt 


a 


M 


M 








^ 




(0 








CI 






0 


♦J 


<-< 




0) 


Q) 










0 


O4 








(d 






u 


u 






m 


(d 










a 










</> 






Cm 


0 








v> 


fd 








a 






fd 


fd 
















rH 




CO 














CO 


CO 



U.S. Patent Jul. 8, 1997 sheet 12 of 18 5,645,833 




eg 



i/) 
> 

H 



o 

> 

CO 
H 

V) 

o 



as 
> 

3: 
a: 

O 



o 

CO 

in 



6-i 
CO 

U 
O 

in 

3: 
to 
> 



3: 

o 



H 



H 
Eh 

>4 



I 

EH 



o 



> 





• 




• 


M 




Id 




O 








3: 


3: 


:£ 


3: 


to 


CO 


CO 


CO 






H 
M 


EH 














> 






0 






•-3 








3: 


3: 






H 


Eh 












7: 


2: 







3: 

CO 

H 
M 

< 

to 





> 


> 




> 












H 


EH 


EH 


EH 


H 




>H 






>H 




> 


> 










0 




to 
































0 








< 


< 






*< 












• 


• 


• 




Pi 


0 


0 






ca 


0 








CO 




0 


S 






3: 






3: 




to 


to 


to 


CO 


to 


> 


> 




H 


Eh 


> 






M 


M 



to 




• 


* 


• 


9 


* 


• 










M 






M 


M 




M 


v» 


X 


-H 






M 


c 








0> 




C 










M 


U 


0 


-H 


4) 






1: 


0 


M 


M 


(0 


V) 


a 








-H 


U 




10 






M 




fH 








M 


H 


EH 


id 






u 


CO 










«d 




EH 


fd 




(d 














§ 














M 










Id 










rH 






a 








u 








10 







> 

> 
to 

Oi 
H 

a 



to 

> 

H 
U 



to to 



> 

CD 

04 



04 
E-4 

o 



> 2 

X o: 

CO to 

H > 



to 
1-4 



Eh 

&4 



C3 

to 

CO 

o 

to 



t3 

ac: 



> 



a 

M 
Eh 



C3 

o 



o; 
to 
• 

to 

H 
> 

> 



to 



> 
O 
O4 

o 
o: 
o« 
o 

8 



ca 
■J 
o 

to 

> 

S3 
o; 

> 



04 



• 


• 


• 


m 


9 


• 


* 


» 


• 


M 


CM 


< 




m 




C5 


Q 


< 


M 




to 














M 


C 


a* 


c 






c 


U 










•H 


1. 


N 




0 




C 


CI 




<U 




C 










M 




M 


N 


Id 






hi 


iO 


M 




^ 


c 


M 


0> 


Id 


c 


a 


-H 






Id 


C9 






(d 


>> 






r-t 










u 


M 






•H 






Id 




0 


H 






Id 




0 


c 












0> 




0> 














U 














Q) 




1 




01 














H 














(0 














CO 












W 






•H 












0 






H 




H 








0 





> 

H 

>H 
> 

CD 

04 

(U 
O4 

CD 

o 



(0 

tj 
3: 
to 

> 
> 

CD 



2 



O 

o 
Id 

c 
o 

£ 
0> 
I— I 

o 
u 



U.S. Patent 



Jul. 8, 1997 



Sheet 13 of 18 



5,645,833 




U.S. Patent 



Jul. 8, 1997 Sheet 14 of 18 



5,645,833 




U.S. Patent jui. 8, 1997 sheet 15 of 18 5,645,833 




BamHI(917A) 



BamHI(7884) 




NoM(1822) 

SoeI(1843) 
Sfil{1830) 
Ndel(2222) 
Sfil(2266) 
Narl(2312) 
Hindnr(2476) 



U.S. Patent 



Jul. 8, 1997 



Sheet 16 of 18 



5,645,833 




U.S. Patent jui. 8, 1997 



Sheet 17 of 18 



5,645,833 



CO 



UJ 

o 



a. 

o 

CO 



o 

CD 

Q. 
LxJ 

<t 

CO 

a. 

CO 



CO 

o 

Ou 
CD 
O 

o 

CO 
CO 



CD 



I/) 



O O > 

"I- <t ^ 

^ LU 

to ^ 

U. Ll. 



2: CO 

<s: <: 

on <t 

<: o 

—J — • 

q; CO 

n: o 



a. 



CO 

o 

CD 

a. 
a. 

CO 



o 

CD 
CD 

UJ 



CD 



o 



UJ 

CO 

o 

CD 

u. 

CD 
Q. O- 
CJ> -J 

CO 

CD Q 



nr CO 
z o 



CO <t 
CO 



CO o 



o 

CO 



O CO >■ 

000 
rvj UI o 



O UJ 



UJ 



CD OC 
UJ 
O UJ 



CC eel 
<C CO 



Qc: CO 

u. u. 

<C CO 

CD >- 

CI- > 

o o 

O Q- 

o 



CO z 
CI. 

—I -J 

O CO 
CO 

CO 
CD 

o 
o 



to Z3 



to 



^ o 



4^ O 



CD I — H- 

cNi Q a 

UJ CO 



CD CO 

I CO 

I OL 

co o 



UJ z 

o ^ 

CD CD 

UJ z 



UJ CO 
UJ 



O CO 

z z 

z !Ij 

<C UI 



UJ o 



CD 



CO 



^ UJ 

CD CD 
I CD 



o o Ui 

CVJ UJ ^ 



LO 



CI fO 



0 




CO 
















n 








1 1 

MB 












CD 










Vi «* 






1 

1 


















1 1 I 

1 J.I 














U» 










1 1, , 


/ ^ 
w / 




U. 


UJ 










UI 


UJ 




0 






-J 












u. 


u. 




CO 






c^ 














a 




|— J 














111 1 




1 i 1 I 












u. 


U. 


















UJ 












-J 




0 


CD 




a 






U. 


























Z 












-J 


















c 




-E 


B 




(/) 








n 




0. 














0 















U.S. Patent Jul. 8, 1997 sheet 18 of 18 



5,645,833 











LU 






1 1 


■ » 1 


0 Li_ 










LU 


<< 


LO 








0 1 




n 






1 








1 1 


0 






Q- 


LU 




00 


00 






u. 












































1 1 


Ll 


rv 




C— 














n 


n 


0 . 






















1 i 


















t — N 


CD 










n _ 




















































1 . 

1 1 11 




tr\ 










I 
i 








1 


1 t 1 








LJ 




U. 


Uu 






00 


UJ 




















00 


00 


Li- 










LU 


















1 * 1 

LLJ 






[ _ 




L& 








































^2 






Liu 










1 
1 




.»J 








to 








I.I 

LlJ 






Q.J' 












L/J 




















CI? 








LlJ 






















Q 




pz 








00 


UJ 


a 












CO 










LU 




























•| 


































^< 




1 1 1 






















1 r ■ 




1 , 
uu 














1 ■ 1 

LU 


1 1 1 
LU 








Wj 






LU 
















^* 






LU 














=^ 




































LU 


LU 








00 












CO 




Q- 


a. 


Ll. 


<c 


















00 


Ol. 


1 




ex. 








1 1 1 


UJ 






Lx. 




















0 




Qu 




or: 












00 




<C 














00 


<c 


0 


0 




UJ 












0 






u. 






















u. 








a. 








0 








UJ 






Dc: 










<! 


Cu 




Q. 






1 


LU 












<c 




:^ 




1 


LU 






00 


CO 






bJ 


LxJ 




1 


















00 




1 














< 






tn 














LO 0 








m 




00 






00 


CO 


UJ 





CO 



00 
CD 



LU 
LU 



o cu 











0. 


c: 
















• ■MS 


0 














E 




B 


0 




E 


E 


E 


E 


to 




to 






00 




CO 




CO 




JD 










jO 


ro 




CO 
























'fO 




to 






fO 
















m 








> 






0 




0 


as 




0 




0 


+-> 






















<c 




<c 






<£ 











5,645,833 



INHmrrOR RESISTANT SERINE 
PROTEASES 

The present inventioD relates to serine proteases of the 
chymotrypsin superfamily which have been modified so that 5 
they exhibit resistance to serine protease inhibitors. The 
invention also relates to the precursors of such compounds, 
their preparation, to nucleic acid coding far them and to their 
pharmaceutical use. 

Serine proteases are endopeptidases which use serine as lO 
the nudeophile in peptide bond cleavage. There are two 
known superfamilies of serine proteases and these are the 
chymotiypsin superfamily and the Streptomyces subtilisin 
superfamily (Barrett, A. J., in: Proteinase Inhibitors, Ed, 
Barrett, A. J. et al., Elsevier, Amsterdam, pp 3-22 (1986) and 15 
James, M. N. G., in: Proteolysis and Physiological 
Regulation, Ed. Ribbons, D. W. et al, Academic Rress, New 
York, pp 125-142 (1976)). 

The present invention is particularly concerned with 
serine proteases of the chymotrypsin superfamily which 20 
includes such compounds as plasmin, tissue plasminogen 
activator (t-PA), urokinase-type plasminogen activator 
(u-PA), trypsin, diymotrypsin, granzyme, elastase, acrosin, 
tonin, myeloblastin, prostate-specific antigen (PSA), 
gamma-renin, tryptase, snake venom serine proteases, 25 
adipsin, protein C, cathepsin G, complement components 
CIR, CIS and C2, complement factors B, D and I, chymase, 
hepsin, meduUasin and proteins of the blood coagulation 
cascade including kallikrein, thrombin, and Factors Vila, 
DCa, Xa, XIa and Xlla. Members of the chymotrypsin 30 
superfamily have amino acid and structural homology of the 
catalytic domains, although a comparison of the sequences 
of the catalytic domains reveals the presence of insertions or 
deletions of amino acids. However, these insertions and 
deletions map to the surface of the folded molecule and thus 35 
do not a£Fect the basic structure although it is likely that they 
contribute to the specificity of interactions of the molecule 
with substrates and inhibitcrs (Strassburger, W. et al, FESS 
Utt^ 157, 219-223 (1983)). 

Serine protease inhibitors are also well known and are 40 
divided into the following families: the bovine pancreatic 
trypsin inhibitor (BPTI) family, the Kazal family, the alpha- 
2-macroglobulin (A2M) family, the Streptomyces subtilisin 
inhibitor (SSI) family, the serpin family, the Kunitz family, 
the four-disulphide core family, the potato inhibitor family 45 
and the Bowman-Birk family. 

Serine protease inhibitors inhibit their cognate serine 
proteases and form stable 1:1 complexes with these pro- 
teases. Structural data are available for several protease- 
inhibitor complexes including trypsin-BPTI, chymotrypsin- so 
ovomucoid inhibitor and chymotrypsin-potato inhibitor 
(Read, R. J. et al., in: Proteinase inhibitors, Ed, Barrett, A. 
J. et al., Elsevier, Amsterdam, pp 301-336 (1986)). A 
structural feature which is common to all the serine protease 
inhibitors is a loop extending from the surface of the 55 
molecule which contains the recognition sequence for the 
active site of the cognate serine protease and, in fact there 
is remarkable similarity in the specific interactions between 
different inhibitors and their cognate serine proteases, 
despite the diverse sequences of the inhibitors. 60 

The serine proteases of the chymotrypsin superfamily 
play an important role in human and animal physiology. 
Some of the most important serine protease inhibitors are 
those which are involved in blood coagulation and fibrin- 
olysis. In the process of blood coagulation, a cascade of 65 
enzyme activities is involved in generating a fibrin network 
which forms the framework of a clot or thrombus. Degra- 



dation of the fibrin network (fibrinolysis) involves the pro- 
tease inhibitor plasmin. Plasmin is formed in the body from 
its inactive precursor plasminogen by cleavage of the pep- 
tide bond between arginine 561 and valine 562 of plasmi- 
nogen. This reaction is catalysed by t-PA or by u-PA. 

If the balance between the clotting and fibrinolytic sys- 
tems becomes locally disturbed, intravascular clots may 
form at inappropriate locations leading to conditions such as 
coronary thrombosis and myocardial infarction, deed vein 
thrombosis, stroke, peripheral arterial occlusion and embo- 
lism A known way of treating such conditions is to admin- 
ister to a patient a serine protease of the chymotrypsin 
superfamily or the precursor of such an enzyme. For 
example, t-PA, u-PA and plasminogen in the form of anisoy- 
lated plasminogen conplexed with streptokinase are used in 
the treatment of myocardial infarction; plasminogen is used 
to supplement the natural circulatory plasminogen level to 
enhance thrombolytic therapy; and protein C is used as an 
antithrombotic agent Serine proteases of the chymotrypsin 
superfamily, for exanq>le factors Vila and DC, are adminis- 
tered for induction of blood clotting in disorders such as 
haemophilia. A major problem with the use of all of these 
agents in this type of therapy is their rapid neutralisation by 
serine protease inhibitors which reduces the efi&dency of the 
therapy and increases the dose of agent required. It would 
therefore be advantageous to develop modified analogues of 
these endopeptidases which are resistant to inactivation by 
serine protease inhibitors whilst maintaining their activity. 
However, it is not easy to predict modifications which will 
result in increased resistance to inhibition without significant 
decrease in endopeptidase activity. 

WO- A-90 10649 discloses serine proteases of the chy- 
motrypsin superfamily which have been modified and which 
are said to have increased resistance to serine protease 
inhibitors. The authors of that document have studied the 
known structure of the complex between trypsin and BFFI 
and have realised that other than the amino adds in the 
majcH- recognition site, tiie amino adds of trypsin that make 
direct contact with BPTI are located in the region between 
residues 37 and 41 and in the region between residues 210 
to 213 of the polypq)tide chain. The authors have then 
extrs^lated from this on the basis that there is a high degree 
of stroctural homology between the catalytic domains of 
serine proteases and have suggested that mutation of a 
residue in any serine protease equivalent to the 'iyr-39 
residue in trypsin would lead to increased resistance of the 
modified analogue compared with the wild-type serine pro- 
tease. They also suggest that inhibition resistant t-PA ana- 
logues can be made by mutation of an additional stretch of 
seven amino adds which occurs in tPA, but not in trypsin, 
adjacent to the predicted contact point at Arg-304 
(equivalent to Tyr-39 of trypsin). However, although the 
catalytic domains of members of the chymotrypsin super- 
family of serine proteases do, in general, have sequence and 
structural homology, Tyr-39 of trypsin is on a loop structure 
on the surface of the protein and, as is shown in FIG. 1, the 
equvalent regions of other serine proteases are highly 
variable within the superfamily. Indeed, this is acknowl- 
edged in WO-A-9010649. It is, therefore, by no means 
evident that the specific conformation of the loop in tiiis 
region of the protein is conserved between different serine 
proteases, especially in cases where the number of residues 
in the loop differ, as is the case for trypsin and plasmin. 
Thus, although the residues in the region may be aligned 
sequentially because of the alignment of their flanking 
regions which do have similar sequences, it is not at all 
evident that their side-chains are in equivalent spadal loca- 



5,645,833 



tions and, therefore, residues which are equivalent in a 
sequence alignment are not necessarily able to form equiva- 
lent interactions in the folded protein. If plasmin is taken as 
an example, it can be seen from FIG. 1 that there are three 
hydrophobic residues (Phe-22, Met-24 and Phe-26) which 5 
could be involved in a similar hydrophobic interaction to 
that of Tyr-39 in the trypsin/BFTI complex. The numbering 
of the plasmin residues just mentioned is the numbering of 
SEQ ID No 2 which depicts the protease domain of plasmin. 
The residue designated 1 in SEQ ID No 2 is at position 562 
of the mature protein. A study of FIG. 1 shows that any of 
these residues could be equivient to TyT-39 of trypsin which 
occurs at position 29 in the numbering system of FIG. 1. 
Clearly, therefore, the method described in WO-A-9010649 
for designing a protease which is resistant to inhibition is not 
wholly reliable and it would be preferable to design inhibi- 
tlon resistant mutants in a different way. 

The present inventors have realised that, because the 
serine protease inhibitors are structurally homologous in 
their active centre loop and form similar interactions with 
their cognate serine proteases (Read, R. J. et al., in: Pro- 20 
teinase Inhibitors, Ed. Barrett, A. J. et al., Elsevier, 
Amsterdam, pp 301-336 (1986)).. mutations in any given 
serine protease which result in resistance to inhibition by a 
serine protease inhibitor may be applicable to mutations of 
spatially or sequentially equivalent residues in any other 25 
member of the chymotrypsin superfamily. 

The interaction between enzyme and inhibitor respon- 
sible for inhibition of enzyme activity involves the catalytic 
site amino acids of the enzyme and the reactive site amino 
acids of the inhibitor. Tliis principal interaction is stabilised 
by other interactions between the molecules. Although there 
is a comparatively large surface of interaction between the 
protease and the inhibitor, the protease/inhibitor complex is 
mainly stabilised by a few key interactions. These are 
exemplified by the interactions observed in the protease/ 
inhibitor complex between trypsin and BFTI (Huber, R. et 
al., J. MoL Biol 89:73-101 (1974)), which serves as a model 
for the interaction between the catalytic domains of other 
serine proteases and their cognate inhibitors. In the trypsin/ 
BPn complex, the key residues of the protease, apart from 
those in the principal recognition site, which interact with 40 
the inhibitor are residues 37-41 and 210-213 (chymotrypsin 
numbering), with 1Vr-39 being the most important. This 
interaction served as the basis for WO-A-9010649 in which 
the spatially equivalent residues in the t-PA/PAI-l complex 
were identified, and inhibitor-resistant mutants were 45 
described. 

In contrast to the disclosure WO-A-9010649, the present 
inventors have realised that the desired disruption of the 
protease/inhibitor interactions which lead to inhibitor resis- 
tance need not be caused by mutating the specific residues so 
identified in that document or their equivalents in other 
serine proteases. Instead, residues in spadal, rather than 
sequential, proximity to these key residues, may be mutated 
resulting in a less stable complex between the protease and 
the inhibitor. 55 

In a first aspect of the present invention, there is provided 
a modified endopeptidase of the chymotrypsin superfamily 
of serine proteases or a precursor of such an endopeptidase, 
which is resistant to serine protease inhibitors, characterised 
in that the modification comprises the mutation of one or 60 
more residues in dose spacial proximity (other than sequen- 
tial proximity) to a site of interaction between the protease 
and a cognate protease inhibitor. 

In the context of this invention, the term 'precursor', 
when used in relation to a serine protease, refers to a protein 65 
which is cleavable by an enzyme to produce an active serine 
protease. 



Mutations resulting in resistance to the inhibitor may 
induce: 

i) a conformational change in the local fold of the protease 
sudi that the resulting complex with the inhibitor is less 
stable than the equivalent complex between the inhibi- 
tor and the wild- type protein; 

ii) a change in the relative orientations of the protease and 
inhibitor on forming a complex such that the resulting 
complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; 

iii) a change in the stenc bulk of the protease in the region 
of the inhibitor-binding site such that the resulting 
complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; 

iv) a change in the electrostatic potential field in the 
region of the inhibtor-binding site such that the result- 
ing complex is less stable than the equivalent complex 
between the inhibitor and the wild-type protein; or 

v) any combination of the above. 

Tlie residues to be mutated need not be sequentially close 
to the key residues involved in the protease/inhibitor 
interaction, since the three-dimensional folding of the pro- 
tease chain brings sequentially distant residues into spatial 
proximity. It is necessary to select the residues for mutation 
based on a model of either the protease used to generate the 
mutant, or of another member of the chymotrypsin super- 
faniily of serine proteases. Where the three-dimensional 
structure of the protease to be mutated is not known, the 
selection of residues for mutation may be based either on a 
three-dimensional model of the protein to be mutated 
derived using homology modelling or other techniques, or 
on sequence alignments between the protein to be mutated 
and other members of the chymotrypsin superfamily of 
serine proteases with known three-dimensional structures. If 
sequence alignments are employed, it is not necessary to 
generate a three-dimensional structural model of the pro- 
tease of interest in order to select residues for mutation to 
give inhibitor resistance, as spatial proximity to the key 
residues can be inferred from those proteins in the alignment 
with known three-dimensional structures. The spatial rela- 
tionships between the residues to be mutated and the key 
residues in the protease/inhibitor interaction may be inferred 
by any appropriate method. Suitable methods are known to 
those skUled in the art. 

The modified serine protease may be any serine protease 
of the chymotrypsin superfamily since all of these enzymes 
have a common mechanism of action. Examples of serine 
protease inhibitors which can be modified according to the 
present invention are as follows: 

plasmin, tissue plasminogen activator (t-PA), urokinase- 
type plasminogen activator (u-PA), trypsin, chymotrypsin, 
granzyme, elastase, acrosin, tonin, myeloblastin, prostate- 
specific antigen (PSA), gamma-renin, tryptase, snake venom 
serine proteases, adipsin, protein C, cathepsin G, comple- 
ment components CIR, CIS and C2, con^lement factors B, 
D and I, chymase, hepsin, medullasin and proteins of the 
blood coagulation cascade including kaUikrein, thrombin, 
and Factors VHa, DCa, Xa, XIa and Xlla. 

However, modified analogues of plasmin, t-PA, u-PA, 
activated protein C, thrombin, factor Vila, factor DCa, factor 
Xa, factor XIa and factor XHa are particularly useful, as is 
a modified version of plasminogen, since all of these com- 
pounds can be used as fibrinolytic or thrombotic agents. An 
inhibition resistant plasmin analogue is particularly pre- 
ferred. 

The serine protease inhibitor to which the modified serine 
protease of the invention is resistant will obviously depend 



5,645,833 



on which serine protease has been modified. In the case of 
plasmin, the primary physiological inhibitor is 
a2-antiplasmin whidi belongs to the serpin family of serine 
protease inhibitors. The reaction between plasmin and 
a2-antiplasmin consists of two steps: a very fast reversible s 
reaction between the kringle 1 lysine binding site of plasmin 
and the caiboxy-texminal region of the inhibitor, followed by 
a reaction between the catalytic site of plasmin and the 
reactive site of the inhibitor which results in the formation 
of a very stable 1:1 stoichiometric enzymatically inactive lO 
complex (Holmes, W. E. ct aL, /. BioL Chenu, 262, 
1659-1664 (1987)). Therefore, when the serine protease is 
plasmin, it is particularly useful if the serine protease 
inhibitor to which the plasmin is resistant is a2-antiplasmia. 
Plasmin is also inhibited by a2 -macroglobulin and \$ 
al-antitrypsin and resistance to inhibition by these inhibi- 
tors is also useful. 

From a three-dimensional model of the plasmin/ 
antiplasmin complex, (described in Method 1), it has been 
determined that, in plasmin, ^e residues which are in close 20 
spatial proximity to the key residues of interaction between 
the protease and the inhibitor are residues 17-20, 44-54, 62, 
154, 158, 198-213. The numbering used above is the 
numbering system of sequence ID No 2 which represents tiie 
protease domain of plasmin and begins at position 562 of the 25 
mature protein. In order to be resistant to inhibition by a 
serine protease inhibitor such as ((x2-antiplasmin, it is nec- 
essary to modify plasmin in one or more of these regions. 
I^otease inhibition resistance can be induced in other serine 
proteases of the chymotrypsin supeifamily by modifying 30 
equivalent regions of these proteins. HG. 1 shows the 
sequences of the protease domains of a variety of proteases 
and, from a study of FIG. 1, it is dear where modifications 
should be made in OTder to induce resistance to protease 
inhibitors. In the numbering system of FIG. 1, the modifi- 35 
cation regions just mentioned occur at residues 17-22, 
49-64, 72, 203, 214, and 264-281. The types of mutations 
whidi are suitable for inducing resistance to inhibition 
include single or multiple amino add substitutions, addi- 
tions or deletions. However, amino add substitutions are 40 
particularly preferred. 

In plasmin, examples of amino add substitution muta- 
tions whidi result in a modl&ed response to inhibition by 
a2-antiplasniin, using the numbering system of SEQ ID No 
2, are Glu-62 to Lys or Ala, Ser-17 to Leu, Arg-19 to Glu or 45 
Ala, and Glu-45 to Lys, Arg or Ala. Resistance to protease 
inhibition can be induced in other serine proteases by 
making modifications at equivalent positions. The degree of 
resistance to inhibition may be altered by making either 
single or multiple mutations in the protease, or by altering 50 
the nature of the amino add used for substitution. 

In addition to the modification of the invention, the serine 
protease may be modified in other ways as coir^ared to 
wOd-type proteins. Any modifications may be made to the 
protein provided that it does not lose its activity. 55 

As an alternative to a modified serine protease, it is also 
possible to modify a precursor of the enzyme so that the 
enzyme derived from the precursor will have the desired 
resistance to inhibition. An example of a serine protease 
precursor is plasminogen which is the inactive precursor of 60 
plasmin. Conversion of plasminogen to plasmin is accom- 
plished by cleavage of the peptide bond between arginine 
561 and valine 562 of plasminogen. Under physiological 
conditions this deavage is catalysed by t-PA or u-PA. 
Qeavage of a modified plasminogen variant of the present 65 
invention will produce a plasmin variant as descried above 
and it is, of course, preferable that the plasminogen variant 



will be deaved to produce one of the preferred plasmin 
variants described above. 

Again, as with serine proteases, the precursors may have 
other modifications. Analysis of the wild-type plasminogen 
molecule has revealed that it is a glycoprotein composed of 
a serine protease domain, five kringle domains and an 
N-terminal sequence of 78 amino adds which may be 
removed by plasmin cleavage. Cleavage by plasmin 
involves hydrolysis of the Arg(68)-Met(69), Lys(77)-Lys 
(78) or Lys(78)-Val(79) bonds to create forms of plasmino- 
gen with an N-terminal methionine, lysine or valine residue, 
all of which are commonly designated as lys-plasminogen. 
Intact plasminogen is referred to as glu-plasminogen 
because it has an N-terminal glutamic add residue. Glyco- 
sylation occurs on residues Asn(289) and Thr(346) but the 
extent and composition are variable, leading to the presence 
of a number of different molecular weight forms of plasmi- 
nogen in the plasma. Any of the above plasminogen variants 
may be modified to produce a variant according to the 
present invention. The protein sequencing studies of 
Sottrap-Jensen et al (in: Atlas of Protein Sequence and 
Structure (Dayhoff, M. O., ed.) 5 suppl. 3, p.95 (1978)) 
indicated that plasminogen was a 790 amino add protein 
and that the site of deavage was the Aig(560)-Val(561) 
peptide bond. A plasminogen variant which is suitable for 
modification according to the present invention is a 791 
residue protein with an extra lie at position 65 and encoded 
by cDNA isolated by Forsgren et al (FEBS Letters, 213, 
254-260 (1987)). The serine protease domain of any of these 
plasminogen analogues can be recognised by its homology 
with serine proteases and on activation to plasmin is the 
catalytically active domain involved in fibrin degradation. 
The five kcingle domains are homologous to those in other 
plasma proteins such as tPA and prothrombin and are 
involved in fibrin binding and thus localisation of plasmi- 
nogen and plasmin to thrombi. 

The plasminogen analogues of the present invention may 
also contain other modifications (as compared to wild-type 
glu-plasminogen) which may be one or more additions, 
deletions or substitutions. Examples of particularly suitable 
plasminogen analogues are disclosed in our copending 
applications WO-A-9109118 and GB 9222758.6 and com- 
prise plasminogen analogues which are cleavable by an 
enzyme involved in blood clotting no produce active plas- 
min. These plasminogen analogues may, according to the 
present invention, be further modified so that, on cleavage, 
the plasmin whidi is produced is resistant to inhibition by 
serine protease inhibitors such as cc2-ant^>lasmin. Other 
plasminogen analogues which xaay be modified to produce 
the plasminogen analogues of the invention are analogues in 
which there has been an addition, removal, substitution or 
alteration of one or more kringle domains. Other suitable 
plasminogen analogues are Lys-plasminogen variants in 
which the amino terminal 68, 77 or 78 amino adds have 
been ddeted. Such variants may have enhanced fibrin bind- 
ing activity as has been observed for lys-plasminogen com- 
pared to wild-type glu-plasminogen (Bok, R. A. and Mangel, 
W. F., Biochemistry, 24, 3279-3286 (1985)), Also included 
within the scope of the invention are plurally-modified 
plasminogen analogues which indude one or more modifi- 
cations to prevent, reduce or alter glycosylation patterns. 
Such analogues may have a longer half-life, reduced plasma 
clearance and/or higher specific activity. 

The modified serine proteases and serine protease precur- 
sors of the invention can be prepared by any suitable method 
and, in a second aspect of the invention, there is provided a 
process for the preparation of such a serine protease or serine 



5,645,833 



8 



protease precursor, the process comprising coupling 
together successive amino acid residues and/or ligating 
oligopeptides. Although the proteins may, in principle, be 
synthesised whoUy or partly by chemical means, it is pre- 
ferred to prepare them by ribosomal translation, preferably 5 
in vivo, of a corresponding nucleic acid sequence. The 
process may further include an appropriate glycosylation 
step. 

It is preferred to produce proteins of the invention using 
recombinant DNA technology. DNA encoding a naturally 
occurring serine protease or precursor may be obtained from 
a cDNA or genomic clone or may be synthesised. Amino 
acid substitutions, additions or deletions are preferably 
introduced by site-specific mutagenesis. DNA sequences 
encoding glu-plasminogen, lys-plasminogen, other plasmi- 
nogen analogues and serine protease variants may be 
obtained by procedures familiar to those sldlled in the art of 
genetic engineering. 

The process for producing proteins using recombinant 
DNA technology will usually include the steps of inserting 
a suitable coding sequence into an expression vector and 20 
transf ecting the vector into a suitable host cell. Therefore, in 
a third aspect of the invention there is provided nucleic acid 
coding for a modified serine protease as described above. 
The nucleic acid may be either DNA or RNA and may be in 
the form of a vector such as a plasmid, cosmid or phage. The 25 
vector may be adapted to transfect or transform prokaryotic 
cells, such as bacterial cells and/or eukaryotic cells, such as 
yeast or mammalian cells. The vector may be a cloning 
vector or an expression vector and conq>rises a cloning site 
and. preferably, at least one marker gene. An expression 30 
vector will additionally have a promoter operatively linked 
to the sequence to be inserted into the cloning in site and, 
preferably, a sequence enabling the protein product to be 
secreted. 

Most of the proteins of the present invention, including 35 
molecules such as tPA, can easily be obtained by inserting 
the coding sequence into an expression vector as described 
and transfecting the vector into a suitable host cell which 
may be a bacterium such as E. coli, a eukaryotic microor- 
ganism such as yeast ot a higher eukaryotic cell. With 40 
molecules such as plasminogen which are unusually difficult 
to express, it may be necessary to use a vector of the type 
described in our copending application, WO-A-9109118, 
which comprises a first nucleic acid sequence coding for the 
moditied serine protease, operatively linked to a second 45 
nucleic acid sequence containing a strong promoter and 
enhancer sequence derived from human cytomegalovirus, a 
third nucleic add sequence encoding a polyadenylation 
sequence derived from SV40 and a fourth nucleic acid 
sequence coding for a selectable marker expressed from an 50 
SV40 promoter and having an additional SV40 polyadeny- 
lation signal at the 3' end of the selectable marker sequence. 
Such a vector may either comprise a single nucleic acid 
molecule or a plurality of such molecules so that, for 
example, the first, second and third sequences may be 55 
contained in a first nucleic acid molecule and the fourth 
sequence may be contained in a second nucleic acid mol- 
ecule. This vector is particularly useful for the expression of 
plasminogen and plasminogen analogues. 

For any of the proteins of the invention, the vector is 60 
preferably chosen so that the protein is expressed and 
secreted into the cell culture medium in a biologically active 
form without the need for any additional biological or 
chemical procedures. In the case of plasminogen, this can be 
achieved using the vector described above. 65 

In a further aspect of the invention there is provided a 
process for the preparation of nucleic add encoding a 



modified serine protease which exhibits resistance to serine 
protease inhibitors, the process comprising coupling 
together successive nudeotides and/or ligating oligo- and/or 
poly-nudeotides. 

In a further aspect of the invention, there is provided a cell 
transformed or transfec^ed by a vector as described above. 
Suitable cells or ceU lines include both prokaryotic and 
eukaryotic cells. A typical example of a eukaryotic ceU is a 
bacterial cell such as E. colL Suitable eukaryotic cells 
indude yeast cells such as Sacchrcmyces cerevisiae or 
Pichia pastoris. Other examples of suitable eukaryotic cells 
are mammalian cells which grow in continuous culture and 
examples of such cells include Chinese hamster ovary 
(CHO) cells, mouse myeloma ceU lines such as P3X63- 
Ag8.653 and NSO, COS cells, HeLa ceUs, 293 cells, BHK 
cells, melanoma cell lines such as the Bowes cell Une, mouse 
L cells, human hepatoma ceU lines such as HepG2, mouse 
fibroblasts and mouse NIH 3T3 cells. CHO cells are par- 
ticularly suitable as hosts for the expression of plasminogen 
and plasminogen analogues. The transformation of the cells 
may be achieved by any convenient method but electropo- 
ration is a particularly suitable method. 

For some molecules, such as plasminogen, there may be 
a low level of undesirable activation during culture. 
Therefore, in a further aspect of the invention, there is 
provided a eukaryotic host cell transf ected or transformed 
with a first DNA sequence encoding a serpin-resistant serine 
protease and with an additional DNA sequence encoding the 
cognate inhibitor. 

The modified serine proteases of the present invention 
have a variety of uses and, if the serine protease is a 
fibrinolj^c or thrombolytic enzyme, it will be useful in a 
method for the treatment and/or prophylaxis of diseases or 
conditions caused by blood clotting, the method comprising 
administering to a patient an effective amount of the serine 
protease. 

Therefore, in a fiirther aspect of the invention, there is 
provided a modified serine protease according to the first 
aspect of the invention, which is a serine protease having 
fibrinolytic, thrombolytic, antithrombotic or prothrombotic 
properties, for use in medidne, particularly in the treatment 
of diseases mediated by blood clotting. Such conditions 
indude myocardial and cerebral infarction, arterial and 
venous thrombosis, thromboembolism, post-surgical 
adhesions, thrombophlebitis and diabetic vasculopathies. 

The invention also provides the use of a modified 
fibrinolytic, thrombolytic, antithrombotic or prothrombotic 
serine protease according to the first aspect of the invention 
in the preparation of an agent for the treatment and/or 
prophylaxis of diseases or conditions mediated by blood 
clotting. Exansples of such conditions are mentioned above. 

Furthermore, there is also provided a pharmaceutical or 
veterinary composition comprising one or more modified 
serine proteases of the first aspect of the invention together 
with a pharmaceutically and/or veterinarily acceptable car- 
rier. 

The composition may be adapted for administration by 
oral, topical or parenteral routes including intravenous or 
intramuscular injection or infusion. Suitable injectable com- 
positions may comprise a preparation of the compound in 
isotonic physiological saline and/or buffer and may also 
indude a local anaesthetic to alleviate the pain of the 
injection. Similar con:^sitors may be used for infusions. If 
the compound is administered topically, it may be formu- 
lated as a cream, ointment or lotion in a suitable base. 

The compounds of the invention may be supplied in unit 
dosage form, for example as a dry powder or water-free 



5,645,833 



10 



concentrate in a hennetically sealed container such as an 
ampoule or sachet 

The quantity of material to be administered will depend 
on the amount of fibrinolysis or inhibition of clotting 
required, the required speed of action, the seriousness of the 5 
thromboembolic position and the size of the clot Hie 
precise dose to be administered will, because of the very 
nature of the condition which conq>ounds of the invention 
are intended to treat, be determined by the physician. As a 
guideline, however, a patient being treated for a mature 
thrombus will generally receive a daily dose of a plasmino- 
gen analogue of from 0.01 to 10 mg/kg of body weight either 
by injection in for example up to 5 doses or by infusion. 

The invention will now be further described by way of 
example only with reference to the following drawings in 
which: 

FIG. 1 shows the alignment of the catalytic domain amino 
acids of the chymotcypsin superfamily; 

FIGS. 2a and 2b shows maps of the pGWH and pGWHgP 
vectors; 

FIG. 3 shows the effect of Ge2-antLplasmin on the activity 
of plasminogen mutant A3. 

FIG. 4 shows the sequence alignment of ovalbumin and 
a2-antiplasmin used to generate the oc2-antipla5min model. 

The following examples further illustrate the invention. ^ 

Exaiiq)les 1 to 5 describe the expression of various 
plasminogen analogues from higher eukaiyotic cells and 
example 6 describes an assay used to assess resistance to 
a2-antiplasmin. 



EXAMPLE 1 



15 



20 



30 



Construction and Expression of Al and A12 

The isolation of plasminogen cDNA and construction of 
the vectors pGWH and pGWHgP (FIG. 2) have been 
described in WO-A-9109118. In pGWHgP, transcription 35 
through the plasminogen cDNA can initiate at the HCMV 
promoter/enhancer and the selectable marker gpt is 
employed. 

The techniques of genetic manipulation, expression and 
protein purification used in the manufacture of the modified 40 
plasminogen examples to follow, are well known to those 
skilled in the art of genetic engineering. A description of 
most of the tediniques can be found in one of the following 
laboratory manuals: "Molecular Qoning" by T. Maniatis, E 
F. Fritsdi and J. Sambrook published by Cold Spring Harbor 45 
Laboratory, Box 100, New York, or '*Basic Methods in 
Molecular Biology" by L. G. Davis, M. D. Dibner and J. F. 
Battey published by Elsevier Science publishing Co Inc, 
New York. 

Additional and modified methodologies are detailed in the so 
methods section below. 

Plasminogen analogues have been constructed which are 
designed to be resistant to inhibition by Gc2-antiplasmin. Al 
is a plasminogen analogue in which the amino acid Phe-587 
is replaced by Asn. A12 is a plasminogen analogue in which 55 
the Arg-580 is replaced by Glu. The modification strategy in 
this exanq7le is essentially as described in WO-A-9109118 
Example 3, with the mutagenesis reaction carried out on the 
1.87 kb I^nl to HincII fragment of the thrombin activatable 
plasminogen analogue T19 cloned into the bacteriophage 60 
M13mpl8. Single stranded teiiq>late was prepared and the 
mutation made by oligonucleotide directed mutagenesis. For 
Al, a 24 base long oligonucleotide S'GGTGCCTCCA- 
CAATTGTGCAITCCS* (SEQ. ID. 3) was used to direct the 
mutagenesis and for A12 a 27 base oligonucleotide was used 65 
S CCAAACCTTGnTCAAGACTGACITGC 3* (SEQ ID 
7). 



Plasmid DNA was introduced into CHO cells by elec- 
troporation using 800 V and 25 pF as described in the 
methods section below. Selective medium (250 pl/ml 
xanthine, 5 pg/ml mycophenolic acid, Ix hypoxanthine- 
thymidine (HT)) was added to the cells 24 hours post 
transf ection and the media changed every two to three days. 
Plates yielding gpt-resistant colonies were screened for 
plasminogen production using an ELISA assay. Oils pro- 
ducing the highest levels of antigen were re-cloned and the 
best producers scaled up into fiasks with production being 
carefully monitored. Frozen stocks of all these cell lines 
were laid down. Producer ceUs were scaled up into roller 
bottles to provide conditioned medium from which plasmi- 
nogen jH-otein was purified using lysine SEPHAROSE 4B. 
(The wcrd SEPHAROSE is a trade mark.) 

EXAMPLE 2 

Construction and Expression of A3 and A 16 

The procedure of Example 1 was generally followed 
except that the mutagenesis was performed on an EcoRV to 
Hindm fragment (0.85 kb) containing the 3* of wild type 
plasminogen cloned into M13. The oligonucleotide used 
was a 27mer S'GrTCGAGArTCACTTTTTGGTCjTG- 
CAC3' (SEQ. ID. 4) which changed Glu-623 to Lys, thus 
changing an acidic amino acid to a basic amino add. The 
resulting mutant was cloned as an EcoRV to Sphl fragment 
replacing the corresponding wild type sequence. The 27 base 
oUgonucleotide 5*(jTTCGAGArTCACrG(jrTGGT(jTG- 
CAC3' (SEQ ID 10) was used to diange Glu-623 to Ala to 
produce A16. 

EXAMPLE3 
Construction and Expression of A4, A14 and A15 

Mutant A4 is designed to disnipt ionic interactions on the 
surface of plasminogen preventing binding to antiplasmin. 
The mutagenesis and sub-cloning strategy was as described 
in Example 1 using a 24 base oligonucleotide S'CTTGGG- 
GACrTCITCAAGClAC3TGG3* (SEQ. ID. 5) designed to 
convert Glu-606 to Lys. The 24 base oligonucleotide 
5'CITGGGGACrTGGCrAGACA(3TGG 3' (SEQ ID 8) 
was used to change Glu-606 to Ala to produce A14 and the 
25 base oligonucleotide 5*CrrGGGGACITCCTrAGA- 
C:AGTGGG 3' (SEQ ID 9) was used to change Glu-606 to 
Arg to produce A15. 

EXAMPLE 4 

Construction and Expression of A5 

Plasminogen analogue A5 was designed to alter the 
positioning of the lyr 39 containing structural loop and was 
made generally as described in the procedure of Example 1. 
In A5, Ser-578 has been replaced by Leu using the 24mer 
5'CrCGTACGAAGC:AGGACrrGCCAG3* (SEQ. ID. 6) 
on the Kpnl to EcoRV fr^agment of plasminogen in M 13 as 
the template. The mutation was cloned directly into 
pGWlHg.plasminogen using the restriction enzymes Hin- 
dm and SplL These sites had previously been introduced at 
the extreme 5' end of plasminogen and at 1850 respectively 
via mutagenesis; the plasminogen coding sequence was not 
affected by this procedure. 

EXAMPLE 5 

Construction and Expression of double mutant 

A3A4 

Plasminogen mutant A3A4 combines the two mutations 
A3 and A4 as described in Examples 2 and 3 respectively. 



5,6 

11 

Mutagenesis was performed on the EcoRV to SphI fragment 
of A4 cloned into M13 using the A3 mutagenesis oligo- 
nucleotide (SEQ ID4). 

EXAMPLE 6 

Plasmin-Antiplasmin Interaction Assays 

A diromogenic assay was used to assess the resistance of 
the plasmin(ogen) mutants to inhibition by a2-antiplasmin. 
Inhibition of plasmin activity was determined by the change 
in the rate of cleavage of the plasmin chromogenic substrate 
S2251 (Quadratech, P.O. Box 167, Epsom, Surrey. KT17 
2SB). 

Prior to assay* the plasminogens were activated to plasmin 
using either urokinase for mutants in wild type plasminogen, 
or thrombin for thrombin activatable plasminogen mutants 
(WO-A-9109118). Activation of wild-type plasminogen to 
plasmin was achieved by incubation of the plasnxinogen (ca. 
14 pg) with urokinase (16.8x10"^ U) in 1750 pi of assay 
buffer (50 mM TOs, 0. 1 mM EDTA, 0.00005% Triton XlOO, 
0.1% (wA^) human serum albumin, pH 8.0) at 37** C. for 5 
mins. Activation of thrombin activatable plasminogen 
mutants to plasmin was achieved by incubation of the 
plasminogen (ca. 14 pg) with thrombin in 1750 pi of assay 
buffer at 37° C. Hirudin was added to inhibit the thrombin 
activity as thrombin cleaves the chromogenic substrate. 

Plasmin (125 \d) was mixed with 250 pi S2251 (2 mg/ml 
in assay buffer) and 125 pi antiplasmin (1.25 pg in assay 
buffer, #4032 American Diagnostica Inc., 222 Railroad 
Avenue, P.O. Box 1165, Greennwich, Conn. 06836-1165) or 
125 pi assay buffer in a cuvette and the absorbance at 405 
nM measured over time. 

A Beckman DU64 spectrophotometer and Beckman 
"Data Leader" data capture software were used to record 
absorbance at 405 nM at 1 sec intervals for 8 minutes. The 
Data Leader software package was used to calculate the first 
derivative of the data to provide the rate of change of 
absorbance at 405rim against time, an estimate of active 
plasmin concentration against time. Wild type plasmin was 
rapidly inactivated by cx2-antiplasinin; after only 15 seconds 
the plasmin was essentially inactivated. In contrast, plasmi- 
nogen mutant A3 has an antiplasmin resistant phenotype and 
is only slowly inactivated by antiplasmin with a x¥i (half the 
rate of OD change at t^l5 sec) of approximately 75 seconds 
(FIG. 3). 

METHODS 

1. Model structures were built by homology based on the 
x-ray structures of trypsin/BFIX A refined plasminogen 
structure was modelled by homology to thrombin using the 
PPACK/thrombin x-ray structure from Bode et al. (Bode, W. 
et al., EMBO J. 8:3467-3475 (1989). A refined alpha-2- 
antiplasmin [A2AP] structure was modelled by homology to 
ovalbumin using atomic co-ordinates from the Brookhaven 
Protein Data Bank entry lOVA, except for the loop con- 
taining the reactive bond, which was modelled using the 
co-ordinates for residues 13 to 19 of BFTT from the PDB 
entry 2FTC. The alignment used to generate the A2AP 
model is shown in FIG. 4. The A2AP model described here 
does not include co-ordinates for the 79 N-terminal residues 
and 55 C-terminal residues. 

Most serine-protease-directed inhibitors react with cog- 
nate enzymes according to a common, substrate-like stan- 
dard mechanism (Bode, W. and Huber, R., Eun J. Biochenu 
204:433-451 (1992). In particular, they all possess an 
exposed active site-binding loop with a characteristic 



t5,833 

12 

canonical conformation. The binding loop on the A2AP 
model was therefore modelled on the equivalent loop of 
BFFI (residues 13 to 19), using atomic co-ordinates from the 
PDB entry 2PTC (in which BFTI is complexed with 
5 trypsin). 

The complex of A2AP and the plasmin serine protease 
domain was modelled using the trypsin/BFII complex struc- 
ture from PDB entry 2ITC. The A2AP model was fitted to 
the BPn structure by optimising the RMS difference 

10 between the co-ordinates of the backbone atoms in the active 
site-binding loops of the two inhibitors. The plasmin serine 
protease domain model was fitted to the trypsin structure by 
optimising the RMS difference between the co-ordinates of 
the C-alpha atoms of the conserved residues in an optimal 

X5 sequence alignment of the two [jroteins. The A2AP/plasmin 
complex model was then refined by energy-minimisation. 

The homology modelling was performed on a Silicon 
Graphics Indigo workstation using the Quanta molecular 
modelling program from Molecular Simulations Incorpo- 

20 rated. Sequence aligimients were produced using Quanta, 
the GCG sequence analysis software from the University of 
Wisconsin (Devereux, Haeberli and Smithies, Nucleic Acids 
Research 12(l):387-395 (1984), and proprietary sequence 
alignment software. However, the actual method by which 

25 the homology models were built is not critical to this 
invention. 

The trypsin and BFTI sequences used in the homology 
modelling were obtained from the Brookhaven Protein Data 
Bank atomic co-ordinate entry 2FrC, the thrombin sequence 
30 was obtained from the PPACK/thrombin co-ordinate file, the 
plasminogen sequence from the SWISSPROT database 
entry PLMN_HUMAN, and the A2AP sequence from the 
SWISSPROT entry A2AP_HUMAN. 

2. Mung Bean Nuclease Digestion 

35 10 units of mung bean nuclease was added to approxi- 
mately 1 ng DNA which had been digested with a restriction 
enzyme in a buffer containing 30 mM NaOAc pH5.0, 100 
mM NaCl, 2 mM ZnQ, 10% glycerol. The mung bean 
nuclease was incubated at 37** for 30 minutes, inactivated for 

40 15 minutes at 67** before being phenol extracted and ethanol 
precipitated. 

3. Oligonucleotide synthesis 

TTie oligonucleotides were synthesised by automated 
phosphoramidite chemistry using cyanoethyl phosphora- 
45 midites. The methodology is now widely used and has been 
described (Beaucage, S. L. and Caruthers, M. H. Tetrahe- 
dron Letters 24, 245 (1981) and Caruthers, M. H. Science 
230, 281-285 (1985)). 

4. Purification of Oligonucleotides 

50 The oligonucleotides were de-protected and removed 
from the CPG support by incubation in concentrated NH3. 
Typically, 50 mg of CPG carrying 1 micromole of oligo- 
nucleotide was de-protected by incubation for 5 hours at 70** 
in 600 ^l of concentrated NH3. The supernatant was trans- 

55 ferred to a fresh tube and the oligomer precipitated with 3 
volumes of ethanol. Following centrifugation the pellet was 
dried and resuspended in 1 ml of water. The concentration of 
crude oligomer was then determined by measuring the 
absorbance at 260 nm. For gel purification 10 absorbance 

60 units of the cmde oligonucleotide was dried down and 
resuspended in 15 yl of marker dye (90% de-ionised 
formamide, 10 mM tris, 10 mM borate, 1 mM EDTA, 0.1% 
bromophenol blue). The samples were heated at 90° for 1 
minute and then loaded onto a 1.2 mm thick denaturing 

65 polyacrylamide gel with 1.6 mm wide slots. The gel was 
prepared from a stock of 15% acrylamide, 0.6% bisacryla- 
mide and 7M urea in IX TBE and was polymerised with 



5,645,833 



13 



14 



0.1% ammonium persulphate and 0.025% TEMED. The gel 
was i^e-iun for 1 hr. The samples were run at 1500 V for 4-5 
hours. The bands were visualised by UV shadowing and 
those corresponding to the full length product cut out and 
transferred to micro-testubes. The oligomers were eluted s 
from the gel slice by soaking in AGEB (0.5M anunonium 
acetate, 0.0 IM magnesium acetate and 0.1% SDS) over- 
night. The AGEB buffer was then transfeired to fresh tubes 
and the oUgomer precipitated with three volumes of ethanol 
at 70** for 15 mins. The precipitate was collected by centri- lO 
fugion in an Eppendoif microfuge for 10 mins, the peUet 
washed in 80% ethanol, the purified oligomer dried, redis- 
solved in 1 ml of water and finally filtered through a 0.45 
micron micro-filter. (The word EFPENDORF is a trade 
mark.) The concentration of purified product was measured is 
by determining its absorbance at 260 nm. 

5. Kinasing of Oligomers 

100 pmole of oligomer was dried down and resuspended 
in 20 |d kinase buffer (70 mM Tris pH 7.6, 10 mM MgQ, 
1 mM ATR 0.2 mM spermidine, 0.5 mM dithiothreitol). 10 20 
u of T4 polynucleotide kinase was added and the mixture 
incubated at 37° for 30 mins. The kinase was then inacti- 
vated by heating at 70** for 10 mins. 

6. Dideoxy Sequencing 

The protocol used was essentially as has been described 25 
(Biggin, M. D., Gibson, T. J., Hong, G. F. RN.A.S. 80 
3963-3965 (1983). Where appropriate the method was 
modified to allow sequencing on plasmid DNA as has been 
described (Guo, L-H., Wu R Nucleic Acids Research 11 
5521-5540 (1983). 30 

7. Transformation 

Transformation was accomplished using standard proce- 
dures. The strain used as a recipient in the cloning using 
plasmid vectors was HW87 or DH5 which has the following 
genotype: 3S 



araD139(ara-lcu)dcn€97 (lac]POZY)den4 galU galK bsdR ipsL 
srI recAS6 

RZ1032 is a derivative of E, coU that lacks two enzymes 40 
of DNA metabolism; (a) dUTPase (dut) which results in a 
high concentration of intracellular dUTP, and (b) uracil 
N-glycosylase (ung) which is responsible f<x removing mis 
incorporated uracils from DNA (Kunkel et al. Methods in 
Enzymol., 154, 367-382 (1987)). its principal benefit is that 45 
these mutations lead to a higher frequency of mutants in site 
directed mutagenesis. RZ1032 has the following genotype: 



H&KL16PO/4S[lysA961-62>, dutl, ungl, thil, ie[A], 23x1. 
279: lib 10, su{£44 



50 



JM103 is a standard recipient strain for manipulations 
involving M13 t>ased vectors. 
8. Site Directed Mutagenesis 

Kinased mutagenesis primer (2.5 pmole) was annealed to 55 
the single stranded tenq)late DNA, which was prepared 
using RZ1032 as host, (1 pg) in a final reaction mix of 10 pi 
containing 70 mM Tris, 10 mM MgC12. The reaction mix- 
ture in a polypropylene micro-testube (EPPENDORF) was 
placed in a beaker containing 250 ml of water at 70** C. for 60 
3 minutes followed by 37** C. for 30 minutes. The annealed 
mixture was then placed on ice and the following reagents 
added: 1 of 10 X TM (700 mM TOs, 100 mM MgC12 
pH7.6), 1 pi of a mixture of all 4 deoxyribonucleotide 
triphosphates each at 5 mM, 2 pi of T4 DNA ligase (lOOu), 6S 
0.5 pi Klenow fragment of DNA polymerase and 4.5 pi of 
water. The polymerase reaction mixture was then incubated 



at 15** for 4-16 hrs. After the reaction was complete, 180 pi 
of TE (10 mM Tris, 1 mM EDTApHS.O) was added and the 
mutagenesis mixture stored at -20** C. For the isolation of 
mutant clones the mixture was then transformed into the 
recipient JM1(B as follows. A 5 ml overnight culture of 
JM103 in 2 X Vr (1.6% Bactotryptone, 1% Yeast Extract 
\% NaQ) was diluted 1 in a 100 into 50 ml of pre-warmed 
2 X YT. The culture was grown at 37° with aeration until the 
A600 reached 0.4. The cells were pelleted and resuspended 
in 0.5 vol of 50 mM CaC12 and kept on ice for 15 mins. The 
cells were then re-pelleted at 4^ and resuspended in 2.5 ml 
cold 50 mM CaC12. For the transfection, 0.25, 1, 2, 5, 20 and 
50 pi aliquots of the mutagenesis mixture were added to 200 
pi of competent cells which were kq)t on ice for 30 mins. 
The cells were then heated shocked at 42** for 2 mins. To 
each tube was then added 3.5 ml of YT soft agar containing 
0.2 ml of a late exponential culture of JM103, the contents 
were mixed briefly and then poured onto the surface of a 
pre-warmed plate containing 2 X YT solidified with 1.5% 
agar. The soft agar layer was allowed to set and the plates 
then incubated at 37** overnight 

Single stranded DNA was then prepared from isolated 
clone as follows: Single plaques were picked into 4 ml of 2 
X YT that had been seeded with 10 pi of a fresh overnight 
culture of JM103 in 2 X YT The culture was shaken 
vigorously for 6 hrs. 0.5 ml of the culture was then removed 
and added to 0.5 ml of 50% glycerol to give a reference 
stock that was stored at —20**. The remaining culture was 
centiifuged to remove the cells and 1 ml of supernatant 
canying the phage particles was transferred to a fresh 
EPPENDORF tube, 250 pi of 20% PEG6000, 250 mM NaQ 
was then added, mixed and the tubes incubated on ice for 15 
mins. The phage were then pelleted at 10,000 rpm for 10 
mins, the supernatant discarded and the tubes re-centrifiiged 
to collect the final traces of PEG solution which could then 
be removed and discarded. The phage peUet was thoroughly 
resuspended in 200 pi of TEN (10 mM Iris, 1 mM EDTA, 
0.3M NaOAc). The DNA was isolated by extraction with an 
equal voliune of Tris saturated phenol The phases were 
separated by a brief centrifugation and the aqueous phase 
transferred to a clean tube. The DNA was re-extracted with 
a mixture of 100 pi of phenol, 100 pi chloroform and the 
phases again separated by centrifugation. Traces of phenol 
were removed by three subsequent extractions with chloro- 
form and the DNA finally isolated by precipitation with 2.5 
volumes of ethanol at -20** overnight The DNA was pel- 
leted at 10,000 rpm for 10 min, washed in 70% ethanol, 
dried and finally resuspended in 50 pi of TE. 
9. Electroporation 

Chinese hamster ovary cells (CHO) or the mouse 
myeloma cell line p3x63-Ag8.653 were grown and har- 
vested in mid log growth phase. The cells were washed and 
resuspended in PBS and a viable ceU count was made. The 
cells were then pelleted and resuspended at 1x107 cells/ml. 
40 pg of linearised DNA was added to 1 ml of cells and 
allowed to stand on ice for 15 mins. One pulse of 8(X) V/ 25 
pF was administered to the cells using a commercially 
available electroporation apparatus (BIORAD GENE 
PULSER — trade mark). The cells were incubated on ice for 
a further 15 mins and then plated into 5 x96 well plates with 
200 pi of medium per well (DMEM, 5% PCS, Pen/Strep, 
glutamine) or 3x9 cm dishes with 10 mis medium in each 
dish and incubated overnight After 24 hrs the medium was 
removed and replaced with selective media containing xan- 
thine (250 pg/ml), mycophenolic acid (5 pg/ml) and 
Ixhypoxanthine-thymidine (HT). The cells were fed every 
third day. After about 14 days gpt resistant colonies are 



5,6 

15 

evident in some of the wells and on the plates. The plates 
were screened for plasminogen by removing an aliquot of 
medium from each weU or plate and assayed using an EUSA 
assay. Qones producing plasminogen were scaled up and the 
expression level monitored to allow the selection of the best 
producer. 

10. ELISA for Human Plasminogen 

EUBA plates (Pro-Bind, Falcon) are coated with 50 
^l/well of goat anti-human plasminogen serum (Sigma) 
dUuted 1:1000 in coating buffer (4.0 g Na2CO3(10.H20), 
2.93 g NaHCOa per liter H20. pH 9.6) and incubated 
overnight at 4** C. Coating solution is then removed and 
plates are blocked by incubating with 50 ^1/well of PBS/ 
0.1% casein at room temperature for 15 minutes. Plates are 
then washed 3 times with PBS/0.05% TXveen 20. Samples of 
plasminogen dr standards diluted in PBS/I\veen are added to 
the plate and incubated at room temperature for 2 hours. The 
plates are then washed 3 times with PBS/IXveen and then 50 
fil/well of a 1:1000 dilution in PBS/TWeen of a monoclonal 
antihuman plasminogen antibody (eg #3641 and #3642 from 
American Diagnostica» New York, U.S.A.) is added and 
incubated at room temperature for 1 hour. The plates are 
again washed 3 times with PBS/Twcen and then 50 plAvell 
of horse radish peroxidase conjugated goat anti-mouse IgG 
(Sigma) is added and incubated at room temperature for 1 
hour. Alternatively, the bound plasminogen is revealed by 
incubation with 50 ^Vwell of horse radish peroxidase con- 
jugated sheep anti-human plasminogen (The Binding Site). 
The plates are washed 5 times with PBS/Tween and then 
incubated with 100 jil/weH of peroxidase substrate (O.IM 
sodium acetate/dtric acid buffer pH 6.0 containing 100 



5,833 

16 

mg/liter 33'4^*-tetramethyl benzidine and 13 mM H202. 
The reaction is stopped after approximately 5 minutes by the 
addition of 25 ^1/well of 2.5M sulphuric acid and the 
absorbance at 450 nm read on a platereader. 

5 11. Purification of Plasminogen Variants 

Plasminogen variants are puriiied in a single step by 
chromatography on lysine SEPHAROSE 4B (Pharmacia). A 
column is equilibrated with at least 10 column volumes of 
0.05M sodiimi phosphate buffer pH 7.5. The column is 

10 loaded with conditioned medium at a ratio of 1 ml resin per 
0.6 mg of plasminogen variant as determined by ELISA 
using human glu-plasminogen as standard. Typically 400 ml 
of conditioned medium containing plasminogen are applied 
to a 10 ml column (H:E>=4) at a linear flow rate of 56 

15 ml/cm/h at 4° C. After loading is complete, the column is 
washed with a minimum of 5 column volumes of 0.05M 
phosphate buffer pH 7.5 containing 0.5M NaQ until non- 
specifically bound protein ceases to be eluted Desorption of 
bound plasminogen is achieved by the application of 0.2M 

20 epsilon-amino-caproic acid in de-ionised water pH 7.0. 
Elution requires 2 column volumes and is carried out at a 
linear flow rate of 17 ml/cm/h. Following analysis by SDS 
PAGE to check 10 purity, epsilon-amino-caproic acid is 
subsequently removed and replaced with a suitable buffer, 

25 eg TYis, PBS, HEPES or acetate, by chromatography on 
pre-packed, disposable, PDIO columns containing SEPHA- 
DEX G-25M (Pharmacia (The word SEPHADEX is a trade 
mark.) Typically, 2.5 ml of each plasminogen mutant at a 
concentration of 0.3 mg/ml are processed in accordance with 

30 the manufacturers* instructions. Fractions containing 
plasminogen, as determined by A280 are then pooled. 



SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

( i i i ) NUMBER OF SEQUENCES: 10 



( 2 )INFORMAnONFORSEQroNO:l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 690 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: double 
( D > TOPOLOGY: fincar 

( i i ) MOLECULE TYPE: cDNA 

( i i i ) HYPOTHETICAL: NO 

( i V ) ANTI-SENSE: NO 

( V i ) ORIGINAL SOXJRCB: 

( A ) ORGANISM: Homo safacns 

( i X ) FEATURE: 

( A ) NAME/KEY: CDS 
( B ) LOCAnON: \..€00 
( D ) OTHER INFORMATION: ^jsitial 
/ codon_stBrt=l 

/ fujictk>o=**cncodcs plasmin protease domain" 
/ pioduct=^ucleotiile with co i rcsponding 
protean" 
/ nuaibeF=l 

( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GTT GTA GGO GGG TOT GTG OCC CAC CCA CAT TCC TOO CCC TOO CAA OTC 48 
Val Val Gly Gly Cyi Vol Ala His Pro His Scr Trp Pro Trp Gin Val 
1 5 10 15 



17 



5,645,833 

-continued 



18 



AGT CTT AOA ACA AGO TTT GGA ATO CAC TTC TOT GO A GGC ACC TTG ATA 96 

Scr Leo Arg Tbr Axg Pbc Gly Met His Pbe Cyi Gly Gly Tbr Lea Ite 

2 0 2 5 3 0 

TCC CCA GAG TGO OTO TTG ACT OCT GCC CAC TGC TTG GAG AAO TCC CCA 144 

Ser Pro Gla Tip Val Leu Tbr Ala Ala His Cys Leu Glu Lys Ser Pro 
3 5 4 0 4 5 

AGG CCT TCA TCC TAC AAO OTC ATC CTO GOT GCA CAC CAA OAA GTG AAT 192 

Arg Pro Ser Ser Tyr Lys Val lie Leu Gly Ala His Olo Glu Val Asn 

5 0 5 5 6 0 

CTC GAA CCG CAT GOT CAO OAA ATA GAA GTO TCT AGO CTO TTC TTG GAG 240 

Lea Olu Pro His Gly Oln Glu Ilo Olu Val Ser Arg Leu Pbe Leu Glu 

65 70 75 ftO 

CCC ACA CGA AAA GAT ATT GCC TTG CTA AAO CTA AOC AGT CCT GCC GTC 2SS 

Pro Tbr Arg Lys Asp Ilo Ala Leo Leu Lys Leu Ser Ser Pro Ala Val 

8 5 9 0 9 5 

ATC ACT OAC AAA OTA ATC CCA OCT TOT CTG CCA TCC CCA AAT TAT OTG 336 

lie Tbr Asp Lys Val lie Pro Ala Cys Leu Pro Ser Pro Asa Tyr Val 

10 0 10 5 110 

GTC OCT GAC CGO ACC GAA TGT TTC ATC ACT OOC TOO GGA OAA ACC CAA 384 

Val Ala Asp Arg Tbr Glu Cys Pbe lie Tbr Gly Trp Gly Oln Tbr Ola 

lis 120 125 

GOT ACT TTT GGA OCT GGC CTT CTC AAG GAA OCC'CAO CTC CCT GTG ATT 432 

Gly Tbr Pbe Gly Ala Oly Leu Leu Lys Glu Ala Gin Leu Pro Val lie 

13 0 13 5 14 0 

GAG AAT AAA GTO TGC AAT COC TAT OAG TTT CTG AAT GGA AGA GTC CAA 480 

Olu Asn Lys Val Cys Asn Arg Tyr Glu Pbe Leo Asn Gly Arg Val Gin 

145 150 155 160 

TCC ACC GAA CTC TOT GCT GOO CAT TTG OCC OOA OOC ACT OAC AGT TGC 528 

Ser Tbr Olu Leo Cys Ala Oly His Leu Ala Oly Oly Tbr Asp Ser Cys 

16 5 17 0 17 5 

CAG GOT GAC AGT OGA GOT CCT CTO OTT TOC TTC GAG AAO OAC AAA TAC 576 

Gin Gly Asp Ser Gly Gly Pro Lou Val Cys Pbe Glu Lys Asp Lys Tyr 

18 0 18 5 19 0 

ATT TTA CAA OGA GTC ACT TCT TOO GOT CTT GGC TGT OCA CGC CCC AAT 624 

lie Leo Olo Oly Val Tbr Ser Trp Gly Leo Gly Cys Ala Arg Pro Asn 

195 200 20 5 

AAG CCT GOT OTC TAT GTT COT GTT TCA AGO TTT OTT ACT TOO ATT GAG 672 

Lys Pro Oly Val Tyr Val Arg Val Ser Arg Pbe Val Tbr Trp lie Glu 

210 215 220 

OGA GTG ATO AOA AAT AAT 690 

Gly Val Met Arg Asn Asn 

2 2 5 2 3 0 



( 2 ) INKMlMAnON F(^ SEQ ID NO:2: 

( i ) SEQUENCE CHARACTERISnCS: 

( A ) LEI40IH: 230 amino acids 
( B ) TYPE: flDUDO acid 
( D ) TOPOLOGY: Iii»ar 

( i i ) MOLECULE TYPE: protein 

( X i ) SEC2UEMCE DBSCRIFnON: SEQ ID N02: 

Val Val Oly Gly Cys Val Ala His Pro His Ser Trp Pro Trp Oln Val 
1 5 10 15 

Ser Leu Arg Tbr Arg Pbe Oly Met His Pbe Cys Oly Oly Tbr Leo lie 

2 0 2 5 3 0 

Ser Pro Olu Trp Val Leo Tbr Ala Ala Hii Cys Leu Olo Lys Ser Pro 
3 5 4 0 4 5 

Arg Pro Ser Ser Tyr Lys Val lie Leu Oly Ala His Oln Glu Val Aso 
5 0 5 5 6 0 



5,645,833 

19 20 

-continued 



Leu Olu Pro Hii Oly Gin Glu lie GIu Val Ser Arg Leu Phe Leu Glu 
65 70 75 80 

Pro Thr Arg Ly» Asp lie Ala Leu Leu Lys Leu Scr Ser Pro Ala Val 

8 5 9 0 9 5 

lie Thr Asp Lys Val lie Pro Ala Cys Leu Pro Scr Pro Asn Tyr Val 

10 0 10 5 110 

Val Ala Asp Arg Thr Glu Cys Pbe lie Tbr Oly Trp Gly Glu Thr Gin 
115 12 0 12 5 

Gly Thr Phc Gly Ala Gly Leu Leu Lys Glu Ala Gin Lcu Pro Val lie 
13 0 13 5 14 0 

Glu ASD Lys Val Cys Asn Arg Tyr Glu Phe Leu Asn Gly Arg Val Gin 
145 150 155 160 

Ser Thr Glu Leu CysAla Gly His Lou Ala Gly Oly Thr Asp Ser Cys 

16 5 17 0 17 5 

Gin Gly Asp Ser Oly Gly Pro Lcu Val Cys Phe Glu Lys Asp Lys Tyr 

18 0 18 5 19 0 

lie Lcu Gin Gly Val Thr Scr Trp Gly Leu Gly Cys Ala Arg Pro Asn 
195 200 205 

Lys Pro Oly Val Tyr Val Arg Val Scr Arg Phe Val Thr Trp lie Glu 
2 1 0 2 1 5 2 2 0 

Gly Val Met Arg Asn Asn 
2 2 5 2 3 0 

( 2 ) INFORMATION FOR SEQ ID NO:3: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 24 base p^ 
( B ) TYPE: nucleic add 
( C ) SIRANDEDNESS; single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i i i ) HYPOTHETICAL: NO 

( i X ) FEATURE: 

( A ) NAME/KEY: miv;. feature 
( B ) LOCAnON: 1..24 

( D ) OTHER INFORMAnON: /ftmctioiJ^'MUTAGENESXS PRIMER 
FOR Al" 

/ froducts^-SYNTHEnC DNA" 
( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
GGTOCCTCCA CAATTGTGCA TTCC 24 



( 2 ) INFORMATION FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 27 base pairs 
C B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: singje 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i X ) FEATURE: 

( A ) NAME/KEY: misc_Jcamre 
( 8 ) LOCAHON: 1.^7 

( D > OTHER INFORMATION: /fua:tioo='*MUTA.aENESIS PRIMER 
FOR A3" 

/ iioduct=**SYNTHEIIC DNA" 



( X i ) SEQUENCE DESCRIPTTON: SEQ ID NO:4: 
GTTCGAOATT CACTTTTTGG TGTOCAC 



21 



5,645,833 

-continued 



22 



( 2 ) IKFORMXnON FOR SEQ ID NO-J: 

( i ) SEQUENCE CHARACTERISnCS: 
( A ) LENGTH: 24 base pairs 
( B ) TYPE: nucleic acid 
( C ) SIRANDEm^ESS: smsle 
( D ) TOPOUX> Y: linear 

( i i )MOLECUl£TYPE:cI»JA 

( i X )FEATURE: 

( A ) NAME/KEY: miscL-f catine 
( B ) LOCAnON: \J24 

( D ) OIHER. INPORMAnON: /iusctioiF="MU]AOENESIS PRIMER 
FOR A4" 

/ ircxJact=i*«YNIHHnC UNA" 
( X i ) SEQUENCE DESCRIFHON: SEQ ID NO:5: 
CTTOOOGACT TCTTCAAOCA OTOO 24 



( 2 ) INPORMAnON FOR SEQ ID HOi6: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENOIH: 24 hasc pans 
( B ) TYPE: awkic acid 
( C ) STRANDEDNESS: single 
( D ) TOPCXjOOY: linear 

( i i ) MCX£CULE TYPE: cl»^ 

( i X ) FEATURE: 

( A ) NAME/KEY: miK-fctture 
( B ) LOCXnON: 1.^ 

( D ) OIHER INPORMAnON: /fiinctioff^'MUIAGENESIS PRIMER 
USEDF<»A5" 
/ frodnct=s^YNIHEnC DNA" 

( X i ) SEQUENCE DESCRIPTKW: SEQ ID NO:6: 

CTCOTACOAA GCAGOACTTG CCAO 24 



( 2 ) INPORMAnON F<» SEQ ID NO:7: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 27 bue pain 
( B ) TYPE: ouclcic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cI»iA 

( i X )FEArURE: 

( A ) NAME/KEY: miy frmirc 
( B ) LOCATIOS: 1.^7 

( D ) OIHER INF<^lMAnON: /lnnct>oop:"MUTAGENESIS PRIMER 
FOR A12" 

/ product="SYNrHEnC DNA" 
( X a ) SEQUENCE CSSCRIFTION: SEQ ID NO:7: 
CCAAACCTTO TTTCAAOACT OACTTOC 27 



( 2 )INPORMAnC»)FORSEQIDNa8: 

( i ) SEQUENCE CHARACIERISTICS: 
( A > LENGTH: 24 bue pun 
( B ) TYPE: lucldc add 
( C ) STRANISDNESS: single 
( D >TOPOLOGY: fincar 

( i i ) MCH£CULB TYPE: cDNA 

( t X }FEArURE: 

( A ) NAMELY: mim-fcanire 
( B ) LOCATION: 1-24 



23 



5,645,833 



-continued 



24 



( D ) OTHER INFORMATION: /fuiictioff3"MUTAGENESIS PRIMER 
FOR A14" 

/ iroAjct="SYNlHEnC DNA" 



( X i ) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
CTTGGGGACT TGGCTAGACA GTGG 



2 4 



( 2 ) INFORMAnON FOR SEQ ID NO:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pars 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( i X ) FEATURE: 

( A ) NAME/KEY: nusc_f eatuie 
( B ) LOCAHON: 

( D ) OTHER INFORMAnON: /function="MtnAGENESIS PRIMER 
FOR A15" 

/ froduct=^'SYNrHEnC DNA" 
( X i > SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CTTGGGGACT TCCTTAGACA GTOGG 



2 5 



( 2 ) INFORMATION FOR SEQ ID NO:10: 

( i ) SEQUENCE CHARACTERI^CS: 
( A ) LENGTH: 27 base pras 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: syntbede DNA 

( i X ) FEATURE: 

( A ) NAME/KEY: misc_Jeature 
( B ) LOCAnON: 1..27 

( C ) OTHER INFORMAnON: /fiMiction="MUTAGENESIS PRIMER 
FOR A16" 

/ product="SYNTHEnC DNA" 
( X i ) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GTTCGAGATT CACTGCTTGG TGTGCAC 27 



We claim: 

1. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 17 according to the numbering of SEQ 
ID NO 2. 

2. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of one or more residues in a region 
corresponding to residues 44 to 54 according to the num- 
bering of SEQ ID NO 2. 

3. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 45 according to the numbering of SEQ 
ID NO 2. 

4. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 
comprises the mutation of the residue in a region corre- 
sponding to residue 62 according to the numbering of SEQ 
ID NO 2. 

5. A plasmin modified so as to exhibit resistance to 
inhibitors of plasmin, characterized in that the modification 



45 

comprises the mutation of one or more residues in a region 
corresponding to residues 202 or 203 according to the 
numbering of SEQ ID NO 2. 

6. A plasmin modified so as to exhibit resistance to 
50 inhibitors of plasmin, characterized in that the modification 

comprises the mutation of one or more residues in a region 
or regions corresponding to residues 17, 44 to 54, 62, 202 
and 203, according to the numbering of SEQ ID NO 2. 

7. A plasmin as claimed in claim 6, which has one or more 
of the following mutations: Ser-17 to Leu, Glu-45 to Lys or 
Arg, or Glu-62 to Lys or Ala, according to the numbering of 
SEQ ID NO 2. 

8. A plasmin as claimed in claim 7, which has the 
following mutations: Glu-62 to Lys and Glu-45 to Lys, 
according to the nimaber of SEQ ID NO 2. 

^ 9. A plasmin precursor, which, when cleaved, fonns a 
plasmin modified so as to exhibit resistance to inhibitors of 
plasmin, characterized in that the modification comprises the 
mutation of one or more residues in a region or regions 
corresponding to residues 17, 44 to 54, 62, 202, and 203, 

65 according to the numbering of SEQ ID NO 2. 

10. A plasmin precursor, which, when cleaved, forms a 
modified plasmin as claimed in claim 7. 



5,645,833 



25 



11. Aplasmin precursor, which, when cleaved, forms said 
modified plasmin of claim 8. 

12. An isolated nucleotide sequence coding for said 
plasmin precursor of claim 9. 

13. The isolated nucleotide sequence of claim 12. further 5 
comprising a first nucleic acid sequence coding for said 
modified plasmin, operatively linked to a second nucleic 
acid sequence containing a strong promoter and enhancer 
sequence derived from human cytomegalovirus, a third 
nucleic acid sequence encoding a polyadenylation sequence lo 
derived from SV40 and a fourth nucleic acid sequence 
coding for a selectable marker expressed from an SV40 
promoter and having an additional SV40 polyadenylation 
signal at the 3' end of the selectable marker sequence. 

14. An expression vector comprising the nucleic acid is 
sequence as in claims 12 or 13. 

15. The vector of claim 14. wherein said vector is selected 
from the group consisting of a plasmid, a cosmid, and a 
f^age. 

16. A cell transformed or transfected with the expression 20 
vector of claim 14. 



26 



17. Hie cell of claim 16, wherein said cell is additionally 
transfected or transformed by an expression vector cono^ris- 
ing a nucleic acid sequence coding for a plasmin inhibitor. 

18. The cell of daim 17, wherein said plasmin inhibitor is 
selected from the group consisting of alpha2-antiplasmin, 
alpha2-macFoglobulin and alpha 1-antdtxypsin. 

19. A pharmaceutical composition comprising a modified 
plasmin as claimed in any one of claims 1 to 8, together with 
a pharmaceuticaUy acceptable carrier. 

20. A pharmaceutical composition conq>rising a modified 
plasmin precursor as claimed in any one of claims 9 to 11, 
together with a pharmaceutically acceptable carrier. 

21. A veterinary coiiq>osition for use in mammals, com- 
prising a modified plasmin as claimed in any one of claims 
1 to 8, together witii a carrier acceptable for veterinary use. 

2Z A veterinary composition for use in mammals, com- 
prising a modified plasmin precursor as claimed in any one 
of claims 9 to 11, together with a carrier acceptable for 
veterinary use. 



PATENT NO. 

DATED 

INVENTOR(S) 



UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 

5,645,833 

July 8, 1997 ^^se 1 of 2 

Keith Martyn Dawson and Richard James Gilbert 



It is certified that error appears in the above-identified patent and that said Letters Patent is herebv 
corrected as shown t>elow: 



and — . 



fn claim 9, at column 24. line 64. after "44 to 54." insert - 



PATENT NO. 

DATED 

INVENTOR(S) 



UNITED STATES PATENT AND TRADEMARK OFFICE 

CERTIFICATE OF CORRECTION 

5,645,833 page 2 of 2 

July 8. 1997 

Keith Martyn Dawson and Richard James Gilbert 



It is certified that error appears in the above-identified patent and that said Letters Patent is hereby 
corrected as shown t>elow: 

At column 23. lines 66 to 67 and cx>lumn 24. lines 45 to 47. cancel claim 5. 
In claim 6, at column 24. lines 51 to 52. cancel -202 and 203". 
In claim 6. at column 24. line 51. after "44 to 54," insert — and — . 
In claim 9. at column 24. line 64. cancel "202. and 203'. 



Ants I: 



Signed and Sealed this 
Third Day of February, 1998 



BRiCE LEHMAN 

Atrestinfi Officer Cnmmi.sshntrr of Putenis and Trudrmarks 



PATENT NO. 

DATED 

INVENTOR(S) 



UNITED STATES PATENT AND TRADEMARK OFHCE 

CERTIFICATE OF CORRECTION 

5,645.833 

July 8, 1997 ^^^^ ^ °^ 2 

Keith Martyn Dawson and Richard James Gilbert 



It is certified that error appears In the above-identtfied patent and that said Lettere Patent Is hereby 
corrected as shown below: 

'n o,a™ 9. a, column 24. ,ine 64. after -44 ,o 54,- insert - and - 

<n Ca^ 13, a, co,un,n 25. „ne 7. after -.od^ie. p^sm,„. -^^ _ _ 



UNITED STATES PATENT AND TRADEMARK OFnCE 

CERTIFICATE OF CORRECTION 

PATENT NO. : 5,645,833 Page 2 of 2 

DATED : July 8, 1997 

INVENTOR(S) : Keith Martyn Dawson and Richard James Gilbert 

It is certified that error appears in the above-Identified patent and that said Letters Patent is hereby 
corrected as shown below: 

At column 23. lines 66 to 67 and column 24. lines 45 to 47. cancel claim 5. 
In claim 6. at column 24. lines 51 to 52. cancel -202 and 203'. 
In claim 6. at column 24. line 51. after "44 to 54." insert — and — . 
In claim 9, at column 24. line 64. cancel "202. and 203". 



Signed and Sealed this 
Third Day of February, 1998 



Aitestinji Officer 



BRl'CE LEHMAN 

Cttmmixxiittier of Patetirx and TruJtmurks 





Exhibit 12 



Volume 1 2 Number 1 1 984 



Nucleic Acids Research 



A comprehensive set of sequence analysis programs for the VAX 



John Devereux. Paul Haeberli* and Oliver Smithies 



Laboratory of Genetics. University of Wisconsin. Madison. WI 53706. USA 



Received 18 August 1983 



ABSTRACT 

The University of Wisconsin Genetics Computer Group (UWGCG) has been 
organized to develop computational tools for the analysis and publication of 
biological sequence data* A group of programs that will interact with each 
other has been developed for the Digital Equipment Corporation VAX computer 
using the VMS operating system. The programs available and the conditions for 
transfer are described, 

INTRODUCTIOW 

The rapid advances in the field of molecular genetics and DNA sequencing 
have made it imperative for many laboratories to use computers to analyze and 
manage sequence data, UWGCG was founded when it became clear to several 
faculty members at the University of Wisconsin that the there was no set of 
sequence analysis programs that could be used together as a coherent system 
and be modified easily in response to new ideas. 

With intramural support a computer group was organized to build a strong 
foundation of software upon which future programs in molecular genetics could 
be based. This initial project has been completed and the resulting programs, 
written in Fortran 77, are available for VAX computers using the VMS operating 
system. Most of the programs can be used with only a terminal, although 

several require a Hewlett Packard plotter. 

UWGCG software has been installed for testing at eight different 
institutions. A simple method has been developed for transferring and 
maintaining this system on other VAX computers. 

DESIGN PRINCIPLES 

UWGCG program design is based on the "software tools'* approach of 
Kernighan and Plauger(l). Each program performs a simple function and is easy 
to use. The programs can be used independently in different combinations so 



© IRL Press Limited, Oxford, England. 



387 



Nucleic Acids Research 



that complex problems are solved by the use of several programs in succession. 
New programming is simplified since less effort is required to bridge a gap 
between existing programs. 

UUGCG software is designed to be maintained and modified at sites other 
than the University of Wisconsin. The program manual is extensive and the 
source codes are organised to make modification convenient. Scientists using 
UWGCG software are encouraged to use existing programs as a framework for 
developing new ones. Our copyright can be removed from any program modified 
by more than 25Z of our original effort. 

PROGRAMS AVAILABLE FROM UWGCG 

The programs described below are named and defined individually in Table 1. 

Program names in the text are underlined. 

Comparisons 

Comparisons may be done with *'dot plots** using the method of Maizel and 
Lenk(2). Optimal alignments can be generated by the methods of Needleman and 
Wunsch(3)y of Sellers(4), and the "local homology" method of Smith and 
Waterman(5). The Smith and Waterman alignment algorithm is also the most 
sensitive method available for identifying similarities between weakly related 
sequences. 

Mapping and Searching 

Mapping is available in several formats. Graphic maps display all of the 
cuts for each restriction enzyme on parallel lines. This graphic map 
facilitates selection of enzymes for isolating any region of a sequenced DNA 
molecule. Sorted maps in tabular format arrange the fragments from any 
digestion in order of molecular weight to show which fragments are similar in 
size and thus likely to be confused in gels. Another frequently used mapping 
format, designed by Frederick Blattner(6), displays the enzyme cuts above the 
original DNA sequence. Both strands of the DNA and all six frames of 
translation are shown. 

All mapping programs will search for user-specif ied sequences, allowing 
features to be marked at the appropriate position on a restriction map. The 
mapping and searching programs can be used to aid site^specif ic mutagenesis 
experiments by showing where mutations could generate new restriction sites. 
All of the positions in a sequence where a synthetic probe could pair with one 
or more mismatches can also be located. Sequences related to less precisely 
defined features such as promoters or intervening sequence splice sites, can 
be located with a program that uses a consensus sequence as a probe. The 



388 



Nucleic Acids Research 





Table 1 




Programs Available from UWGCG 


Name 


Function 


DotPlot* 


makes a dot plot by method of Maizel and Lenk(2) 


Gap 


finds optimal alignment by method of Needleman and WunschO) 


Bes tFiC 


finds ODtimal alisnment bv method of Smith and Waterman(5) 


MapPlot+ 


shows restriction map for each enzyme graphically 


MapSort 


tabulates maps sorted by fragment position and size 


Map 


disolavs restriction sites and orotein translations above 




and below the orisinal seauenceCBlattner ■ 6) 


f!nn fi An A 1 1 R 




Fi tConsensus 


finds seauences similar to a consensus seauence usins a 




consensus table as a orobe 


Find 


finds sites specified interactivelv 


Stemloop 


finds all possible stems (inverted repeats) and loops 


Fold* 


finds an RNA secondary structure of minimum free energy 




bv the method of Zuker(7) 


CodonPre f erence* 


plots the similarity between the codon choices in each 




reading frame and a codon frequency table(8) 


CodonFrequency 


tabulates codon freauencies 


Corras pond 


finds similar oatterns of codon choice bv comoarins 

A ft ft 49 19 ^b BftA ^b 4* ^» ^ ft.^ 49 49 Jb %9 49 v ft B p 49 49 _W ^rfta# ftv ft ft Jtfc 




codon frequency tables (Grantham et al9 9) 


Tes tCode* 


finds possible coding regions by plotting 




the '*TestCode" statistic of Fickett(lO) 


Frame'*' 


plots rare codons and open reading frame8(8) 

ft ft ^9 


PlotStatistics* 


plots asymmetries of composition for one strand 


Composition 


measures comoosition. di and trinucleotide freauencies 


Repeat 


finds repeats (direct, not inverted) 


Fingerprint 


shows the labelled fragments expected for an RNA fingerprint 


Seqed 


screen oriented seauence editor for entering* editing 




and checking sequences 


Assemble 


joins sequences together 


Shuffle 


randomizes a sequence maintaining composition 


Reverse 


reverses and/or complements a sequence 


Reformat 


converts a sequence file from one format to another 


Translate 


translates a nucleotide into a peptide sequence 


BackTranslate 


translates a peptide into a nucleotide sequence 


Spew 


sends a sequence to another computer 


GetSeq 


accepts a sequence from another computer 


Crypt 


encrypts a file for access only by password 


Simplify 


substitutes one of six chemically similar amino acid 




families for each residue in a peptide sequence 


Publish 


arranges sequences for publication 


Poster* 


plots text (for labelling figures and posters) 


OverPrint 


prints darkened text for figures with a daisy wheel printer 



^ requires a Hewlett Packard Series 7221 terminal plotter 
* Fold is distributed by Dr. Michael Zuker not UWGCG. 



389 



Nucleic Acids Research 



mapping programs can also be used on protein sequences Co identify the 
peptides resulting from proteolytic cleavage. 
Secondary Structure 

Three programs are available to examine secondary structure in nucleic 
acids. The program StemLoop Identifies all inverted repeats. An 
implementation of Dr. Michael Zuker's Fold program(7) finds an RNA secondary 
structure of minimum free energy based on published values of stacking and 
loop destabilizing energies. The "dot plot" comparison (mentioned above) of a 
sequence compared to its opposite strand gives a graphic picture of the 
pattern of inverted repeats in a sequence. 

Analysis of Composition and the Location of Genetic Domains 

Regions of a sequence with non-random base distribution can be displayed 
with three graphic tools designed to identify genetic domains. The program 
CodonPreference (8) identifies potential coding regions by searching through 
each reading frame for a pattern of preferred codon choices. The 
CodonPreference plot predicts the level of translational expression of mRNAs 
and helps identify frame shifts In DNA sequence data. Patterns of codon 
choice can be compared with the program Correspond (9) . When a strong pattern 
of codon preferences is not expected, the "TestCode" statistic of Fickett(lO) 
can be plotted to show regions of compositional constraint at every third 
base. Another program plots asymmetries of composition by strand. Strand 
asymmetries have been associated with genetic domains by several 
authorsC 11)( 12) . A fourth program called Frame marks the positions of rare 
codons and open reading frames on a graph showing all six reading frames. 

Several tools are available to measure content and to count dinucleotide , 
trinucleotide, neighbor and repeat frequencies. A program that predicts RNA 
fingerprint patterns and another that tabulates codon frequencies complete the 
group of programs that analyze composition. 
Sequence Manipulation 

Sequences may be entered, assembled, edited, reversed, randomized, 
reformatted, translated, back-translated, documented, transferred, or 
encrypted rapidly with a large set of sequence manipulation tools. 

A screen-oriented editor is available that allows sequences to be entered 
and checked. After a sequence is entered, it may be reentered for 
proofreading. Whenever a reentered base is at variance with the original, the 
terminal bell rings and the position is marked. Existing sequences can be 
edited quickly by moving directly to a sequence position specified by either a 
coordinate or a sequence pattern. The program can reassign the terminal's 



390 



Nucleic Acids Research 



keys to place G, A, T and C conveniently under the fingers of one hand in the 
same order as the Icuies of a sequencing gel. 

Programs are available for changing sequence file format. Sequence data 
from any source can be used in UWGCG programs, and sequence files maintained 
with UWGCG software can be converted for use in other non-UWGCG programs. For 
instance, the programs of Roger Staden(13) or Intelligenetics Inc. (14) could 
be used to assemble a sequence from the sequences of many small sub- fragments 
generated by DNAase I digestion. The assembled sequence could then be 
reformatted for use in any UWGCG program. A program is available that 
transfers sequences to and from other computers. 
Sequence Publication 

A program, Publish , will format sequences into figures. Publish has 
alternatives for line size, numbering, scaling, translation and comparison to 
other sequences. Poster is a program that will plot text on figures. 

GENERAL FEATURES OF UWGCG SOFTWARE 
Interactive Style 

Each program is run by simply typing its name. Every parameter required 
by the program is obtained interactively. Questions are answered with a file 
name, a yes, a no, a number, or a letter from a menu. Default answers are 
displayed. Programs are insensitive to absurd answers and will ask the 
question again if, for instance, you name a file that does not exist or if you 
use a nonnumeric character when typing a number. Special features such as 
plotting features oriented to publication, are obtained by using an extra word 
next to the program's name when the program is run. Thus parameter queries 
are kept to a minimum for the normal use of each program. 
Data 

Both the NIH-GenBank(15) and the EMBL(16) nucleotide sequence data 
libraries are available "on-line" to any UWGCG program. A Search utility will 
locate sequences in the libraries by key word. A Find utility will locate 
library entries containing any specified sequence, A program is available 
that installs the new data sent periodically from GenBank and EMBL to update 
their data libraries. 

All of the data in the system are stored in text files that can be read 
and modified easily. Every data file has an English heading describing the 
contents. The data files may be copied by each user for analysis or 
modification. Programs recognize and read user-modified input data 
automatically. Data files can be modified with any text editor. 



391 



Nucleic Acids Research 



Sequence File Structure 

Sequences are maintained in files that allow documentation and numbering 
both above and within the sequence. This file format is compatible with both 
of the nucleic acid sequence libraries and has been adopted as the standard 
sequence file format by the data base project at the European Molecular 
Biology Lab. Because genetic manipulations commonly involve linking several 
molecules of known sequence, UWGCG sequence files are designed to support 
concatenation by allowing comments to appear within the sequences at any 
location. Coding sequences or the boundaries between cloning vector and 
insert, for instance, can be marked within the sequence itself for immediate 
identification. 
Sequence Symbols 

All possible nucleotide ambiguities and all standard one*letter amino 
acid codes are part of the UWGCG symbol set that includes all alphabetic 
characters plus five additional characters. The proposed lUB-XUPAC standard 
nucleotide ambiguity symbolsd?) are used for the mapping, searching and 
comparison programs. Lower case characters are used in sequences to indicate 
uncertainty as distinct from ambiguity. This allows the entire lexicon of 
symbols to be reused with same meaning, but with the prefix "maybe-." This 
reuse of the symbol set in lower case makes the uncertainty symbols more 
complete, understandable and visible. 
Symbol Comparison 

Sequence analysis programs generally make comparisons between sequence 
symbols (bases or amino acids) in order to find enzyme sites, create 
alignments, locate inverted repeats etc. These symbol comparisons are handled 
in several ways. 

Symbol comparisons for alignment, comparison and secondary structure 
analysis are made by looking up a value in a symbol comparison table for the 
quality of the match. The table might contain 1*8 for matches and O's for 
mismatches. If amino acids are being compared, however, a real ntmiber could 
be assigned at each position based on some previously assigned chemical 

similarity of the pair of residues or on the mutational distance between their 
codons. Standard symbol tables are provided by UWGCG, but the system is 

designed to allow each user to specify his own values. 

Symbols comparisons for mapping and searching operations in nucleic acids 
are made by converting the lUB-IUPAC symbols into a binary code. The bits of 
this code represent G, A, T and C with ambiguity symbols causing more than one 



392 



Nucleic Acids Research 



bit to be set. A group of library functions identify overlap between the bits 

for each lUB-IUPAC symbol. 
Documentation 

Documentation is available both in printed form and on the terminal 
screen. A 350 page manual describes the operation of each program in detail, 
gives practical considerations and shows what will appear on the screen during 
a session with the program. Output files and plots are shown for the session. 
The data for the session shown in the documentation are included with the 
system so that the each program's operation can be checked. The "on-line" 
documentation is the same as the manual, but can be changed immediately when a 
program is modified. 

All programs write output to files that are completely doc\xmented and 
sensibly organized for input to other programs. The input data, the program 
and the parameters used are clearly identified in every output file. 
Procedure Library 

UWGC6 programs are written largely as calls to a library of 250 
procedures designed to manipulate biological sequences. These procedures use 
data and file structures which have been designed to simplify program 
modification. For instance, standard operations such as reading sequences 
from files are always handled by a single library procedure. Thus a change in 
sequence file format requires only one subroutine to be modified for the new 
format to be acceptable to all of the programs in the system. Command 
procedures are available to help modify the library. The procedure library 
can be used by programs written in any language. 

DISTRIBUTION OF UWGCG SOFTWARE 
Intent 

The intent of UWGCG is to make its software available at the lowest 
possible cost to as many scientists as possible. 
Fees 

A fee of $2,000 for non-profit institutions or $4,000 for industries is 
being charged for a tape and documentation for each computer on which UWGCG 
software is installed. While no continuing fee is required, UWGCG software, 
like the field it supports, is changing very rapidly. A consortium of 
industries and academic laboratories is planned to support the project in the 
future. The consortium will entitle its members to periodic updates and to 
influence the direction of new programming undertaken by UWGCG in return for a 
pledge of continuing financial support. 



393 



Nucleic Acids Research 



Copyrights 

UUGCG retains the copyrights to all of its software and UWGCG must be 
contacted before all or any part of the its software package is copied or 
transferred to any machine* UWGCG is, however, mandated to provide research 
tools to help scientists working in the area of molecular genetics and we are 
glad to see our source codes become the basis of further programming efforts 
by other scientists. Copyright can be removed for any program modified by 
more than 23Z of its original effort. 
Tape Format 

The UWGCG package is usually distributed in VAX/VMS "backup" format on a 

9 track magnetic tape recorded at 1600 bits/inch. The system consists of 
about 1000 files using about 20,000 blocks at 312 bytes/block. The current 
versions of the GenBank and EMBL nucleotide sequence data bases are normally 
included which add another 3,000 files and require another 20,000 blocks. 

Upon request UWGCG will make a card image tape of all of the Fortran 77 
programs and procedures for reading on computers other than the VAX. The card 
image tape is usually provided at 1600 bits/ inch with 80 characters/record and 

10 records/block. Adaptation of UWGCG software to systems other than VAX/VMS 
may take considerable effort. 

Equipment Required 

UWGCG programs and command procedures will run on a Digital Equipment 
Corporation (DEC) VAX computer that is using version 3.0 or greater of the DEC 
VMS operating system. A tape drive is necessary; a floating point accelerator 
and a DEC Fortran compiler are helpful, but not required. All programs can be 
run from a DEC VT52 or VTIOO terminal. Seven programs, as noted in table 1, 
require a Hewlett Packard 7221 terminal plotter wired in series with the 
terminal. Several utilities support a daisy wheel compatible printer attached 
to the terminal's pass-through port, however, all programs write output files 
suitable for printing on any standard device. 
Inquiries 

Inquiries may be sent to John Devereux at the Laboratory of Genetics, 
University of Wisconsin, Madison, WI, USA 53706, (608) 263-8970. UWGCG is not 
licensed to distribute Fold (7) , but the UWGCG implementation is available from 
Michael Zuker, Division of Biological Sciences, National Research Council of 
Canada, 100 Sussex Drive, Ottawa, Canada, KIA 0R6 (613) 992-4182, 

ACKNOWLEDGEMENTS 

UWGCG was started with software written for Oliver Smithies' laboratory 



394 



Nucleic Acids Research 



with NIH support from grants GM 20069 and AM 20120. UWGCG is directed by John 
Devereux and is operated as a part of the Laboratory of Genetics with the 
advice of a steering coiniDittee consisting of Richard Burgess, James Dahlberg, 
Walter Fitch, Oliver Smithies and Millard Susman. UWGCG is currently 
supported with intramural funds and with fees paid by the faculty and 
industries using the facility in Madison* This article is paper number 2684 
from the Laboratory of Genetics, University of Wisconsin* 



•Current address: Silicon Graphics Inc.. 630 Clyde Court. Mountain View. CA 94043. USA 
REFERENCES 

1. Kernighan, B*W* and Plauger, P.J. (1976) Software Tools, Addison-Wesley 
Publishing Company, Reading, Massachusetts. 

2. Maizel, J*V* and Lenk, R.P. (1981) Proceedings of the National Academy of 
Sciences USA 78, 7665-7669* 

3. Needleman, S.B. and Wunsch, CD. (1970) Journal of Molecular Biology 48, 
443-453. 

4. Sellers, P.H- (1974) SIAM Journal on Applied Mathematics 26, 787-793. 

5. Smith, T.F- and Waterman, M.S. (1981) Advances in Applied Mathematics 2, 
482-489. 

6. Schroeder, J.L. and Blattner, F.R. (1982) Nucleic Acids Research 10, 
69-84, Figure 1. 

7. Zuker, M. and Stiegler, P. (1981) Nucleic Acids Research 9, 133-148- 

8. Gribskov, M. , Devereux, J. and Burgess, R.R. "The Codon Preference Plot: 
Graphic Analysis of Protein Coding Sequences and Gene Expression," 
submitted to Nucleic Acids Research. 

9. Grantham, R. Gautier, C. Guoy, M. Jacobzone, M. and Mercier R. (1981) 
Nucleic Acids Research 9(1), r43-r74. 

10. Fickett, J-W. (1982) Nucleic Acids Research 10, 5303-5318 

11. Smithies, O. , Engels, W*R. , Devereux, J.R. , Slightom, J.L., and S. Shen, 
(1981) Cell 26, 345-353. 

12. Smith, T.F., Waterman, M.S. and Sadler, J.R. (1983) Nucleic Acids 
Research 11, 2205-2220* 

13. Staden, R. (1980) Nucleic Acids Research 8, 3673-3694. 

14. Clayton, J. and Kedes, L. (1982) Nucleic Acids Research 10, 305-321* 

15. The GenBank(TM) Genetic Sequence Data Bank is available from Wayne 
Rindone, Bolt Beranek and Newman Inc., 10 Moulton Street, Cambridge, 
Massachusetts 02238, USA. 

16. The EMBL Nucleotide Sequence Data Library is available from Greg Hanmi, 
European Molecular Biology Laboratory, Postfach 10.2209, 

Meyerhof strasse 1, 6900 Heidelberg, West Germany. 

17. Personal communication from Dr. Richard Lathe, Transgene SA, 11 Rue 
Humann, 67000 Strasbourg, France. 



395 



Exhibit 13 



350 



Biochimica et Biophysica Acta, 1 173 (1993) 350-352 
© 1993 Elsevier Scicna Publishers B.V. All rights reseivcd 0167-4781/93/ $06,00 



BBAEXP 90506 



Short Sequence*Paper 



Cloning and sequence analysis of rat hepsin, a cell surface 

serine proteinase 

David Farley, Franqoise Reymond and Hanspeter Nick 

Pharmaceuticals RciearcK Ciba-Ceigy Ud., Basel (Switzerland) 
(Received 1 1 February 1993) 

Key words: Hepsin; Serine proteinase; Proteinase, membrane-bound; cDNA sequence; (Rat liver) 

A cDNA coding for the rat serine proteinase hepsin was isolated and its nucleotide sequence has been determined. The cDN A 
was ml nucleot des long and coniained an open reading frame encoding a protein consisting of 416 ammo-acd re«^^"« . ^^^^^ 
deduced ar^ino-acid sequence of the rat enzyme was very similar to the human hepsin sharing an ammo-acd ^^^^^ H . 
S 7% Hydropathy plots reveal the presence of a short hydrophobic region close to the N-terminus ^^^-^^Jl^^^^^^^ 
franrmembrane domain which anchors the proteinase on the cell surface. The predicted sequence contams the H.s. Asp and Ser 
residues which make up the catalytic triad common to all serine proteinases. 



Hepsin is a membrane-bound serine proteinase 
which was originally identified from cDNA clones iso- 
lated from human liver libraries [1]. The role of this 
proteinase is not known and the protein is poorly 
characterized with respect to its physical characteristics 
and substrate specificity. Human hepsin deduced from 
the encoding cDNA consists of 417 amino-acid residues 
and contains a short hydrophobic region near the 
amino-terminus believed to be a membrane spanning 
region. Immunostaining studies of cultured HepG2 cells 
demonstrate that hepsin is localized on the outer cell 
membrane surface with its NH2-terminal side facing 
the cytosol and the carboxyl or catalytic side at the cell 
surface [2,3]. In this paper we report the cloning and 
sequence of the rat liver hepsin gene and compare 
structural similarities with human hepsin and other 
serine proteinases. 

A rat liver cDNA library (Stratagene, No. 936507) 
was screened with a labeled DNA probe corresponding 
lo 137 nucleotides at the 3'-end of the rat hepsin 
cDNA. This cDNA probe had previously been isolated 
attached to a rat 5-alpha-reductase cDNA [4]. Six 
positive clones were isolated after screening about 
4.5 • 10^ phage plaques. Restriction analysis of the 
DNA from the positive plaques revealed that the largest 



Correspondence to: D. Farley, Ciba-Geigy. K125.117. 4002 Basel, 
Switzerland. 

The nucleotide sequencing data reported in ihis paper will appear in 
the DDBJ, EMBL and GenBank Nucleotide Sequence Databases 
under the accession number X70900. 



insert was almost 1800 nucleotides in length. This 
EcoK\ fragment was then subcloned into, the plasmid 
pBSK-(Stratagene). The DNA insert was self-ligated, 
fractionated by sonication, subcloned into M13mpl8 
and both strands were sequenced using the dideoxy 
chain termination method [51. 

The ^nucleotide sequence and the deduced amino- 
acid sequence for rat hepsin are shown in Fig. 1. The 
cDNA presented here is 1739 nucleotides in length and 
contains 184 nucleotides of untranslated sequence at 
the 5'-end, an open reading frame consisting of 1248 
nucleotides encoding a protein of 416 amino-acid 
residues, a TGA stop codon, 304 nucleotides at the 
3'-end and 33 adenine residues believed to make up 
the poly(A) tail. Based on the cDNA sequence, rat 
hepsin would have a predicted molecular mass of 44 930 
Da and contains one potential ^/-linked carbohydrate 
attachment site at Asn-111. 

Alignment of the deduced amino-acid sequence of 
rat and human hepsin is shown in Fig. 2. The aligned 
amino-acids reveal a large degree of homology with 
about 89% of the amino-acid residues being identical. 
Rat hepsin is one amino-acid residue shorter at the 
amino-terminus than the human enzyme. Like human 
hepsin, rat hepsin contains a 27-amino-acid hydropho- 
bic region which is characteristic of a transmembrane 
domain 161. This region is believed to anchor the pro- 
teinase on the outer cell membrane in a specific orien- 
tation with the catalytic domain exposed to the extra- 
cellular environment. Hepsin does not possess an obvi- 
ous signal sequence but does appear to be synthesized 



» « 

* 

i • 



1 

i 

r 



as an inactive precursor with an Arg-161-Ile-162 cleav- 
age site involved in zynnogen activation. Qeavage of 
this peptide bond results in a noncatalytic polypeptide 
consisting of 161 amino-acid residues and a carboxy- 



351 



terminal catalytic chain consisting of 255 residues that 
contains several highly conserved regions common to 
serine proteinases. By comparing the hepsin sequence 
presented here to other well-characterized serine pro- 

* OCACCCCCCA' 



1 1 CCCTGCTGCCTGCTGCTGCCACCCTTCCCTCCCCGCCTCCCCCCTCCTCTCXK^ 

98 CCCAAACCTCCACCATCTCCCGCCAACCCCACCCTIXXXJCCCCACCCCAACACCTCAACCT^ 



V 

1 




M 


A 


K 


E 


G 


C 


R 


T 


A 


P 


C 


c 


S 


R 


P 


K 


V 


A 


A 


L 


T 


V 


32 


i 

• 


IBS 


ATC 


CCC 


AAC 


GAG 


OCT 


GGC 


CGG 


ACT 


CCA 


CCA 


TGC 


TGT 


TCC 


AGA 


CCC 


AAC 


CTC 


CCA 


CCT 


CTC 


ACT 


GTC 






O 


T 


L 


L 


F 


U 


T 


C 


I 


G 


A 


A 


s 


W 


A 


I 


V 


T 


I 


L 


L 


R 


44 


• 

9 ■ 


251 


GGC 


ACC 


CTC 


CTG 


TTC 


CTC 


ACA 


GGC 


ATT 


CGG 


CCT 


GCG 


TCC 


TCG 


GCC 


ArV 


CTG 


ACC 


ATC 




^"i'A 


CCC 




■ 


317 


S 
ACT 


D 
GAC 


Q 
CAG 


E 
CAG 


P 
CCA 


L 

CTC 


Y 
TAC 


0 
CAA 


V 
CTC 


o 

CAG 


L 
CTC 


S 
ACT 


P 
CCC 


C 
CGG 


D 
CAC 


S 
TCT 


R 
CGA 


L 
CTT 


L 
TTC 


y 

CTC 


L 
TTG 


D 

GAC 


66 


r 
t 

t 

<? 




K 
AAC 


T 
ACA 


E 
GAC 


G 
GGA 


T 
ACC 


W 

TCC 


R 
ACG 


L 
CTG 


L 

CTG 


C 
TGC 


S 
TCC 


S 
TCA 


R 

CCC 


S 
TCC 


N 
AAC 


A 

CCC 


R 
AGG 


V 

CTA 


A 

'WCA 


C 


L 
CTC 


G 


68 


1 
1 

1 

o 


449 


c 

TCT 


E 
CAC 


E 
CAG 


M 

ATC 


G 
GGC 


F 
TTT 


L 
CTC 


R 
AGG 


A 

CCT 


L 
CTG 


A 

GCG 


H 
CAC 


s 

TCA 


E 
GAG 


L 
^TC 


D 

CAT 


V 

GTO 


R 


T 


A 


G 


A 

CCC 


110 


I 

• 

1 


515 


• 

N 
AAC 


C 
CCC 


T 
ACA 


S 
TCC 


C 
GGC 


F 

TTC 


F 

tTc 


C 
TCC 


V 
GTC 


D 
GAC 


E 
GAG 


C 
GGC 


C 
CCT 


L 
CTG 


P 

CCT 


L 

CTG 


A 

Mil 

GCT 


Q 

CAC 


R 


L 
TTC 


L 


D 

R rp 
I^Al 


133 


i 


581 


v 

GTC 


I 

ATC 


S 
TCT 


V 
CTA 


C 
TGC 


D 
GAC 


c 

TCT 


P 
CCT 


R 
AGA 


G 
CCC 


R 
CGA 


F 
TTC 


L 

CTC 


T 
ACT 


A 

CCC 


T 
ACC 


C 

TCC 


O 

CAA 


0 

OA^ 


C 
TGT 


c 


R 


154 


• 

t 
< 


647 


R 
ACC 


K 
AAC 


L 
CTC 


P 
CCC 


V 
CTG 


D 
GAT 


R ▼ I 

ccc ATT 


V 
CTG 


G 
GGG 


C 
GGC 


Q 

CAC 


D 

GAC 


S 
AUC 


s 

ACrV. 


L 


G 


R 


W 


P 


w 


o 


176 


713 


v 

GTC 


S 
ACC 


L 
CTG 


R 
OCT 


Y 
TAT 


D 

GAT 


c 

GGC 


T 
ACC 


H 
CAC 


L 
CTC 


C 
TCT 


G 
CCC 


G 
GGA 


s 

TCC 


I. 

CTC 


L 

CTG 


S 
TCC 


C 
GGG 


D 

CAC 


w 

TCG 


V 
CTA 


L 
CTC 


198 


< 


779 


T 
ACC 


A 

CCT 


A 
CCA 


★ 

H 

CAC 


C 
TCC 


F 
TTT 


P 
CCA 


E 
GAG 


R 
ACG 


N 
AAC 


R 

CGG 


V 
CTC 


L 
CTC 


S 
TCT 


R 
CGG 


W 
TCC 


R 
CGA 


V 
CTA 


F 
TTT 


A 
CCT 


G 
GGT 


A 
OCT 


320 


< 

* 


845 


V 
CTA 


A 

CCC 


R 

CCC 


T 

ACC 


S 
TCA 


P 

CCT 


H 
CAT 


A . 
CCC 


V 
CTC 


Q 
CAG 


L 
CTC 


G 
CCC 


V 
CTT 


0 
CAC 


A 
GCT 


V 
GTC 


I 

ATC 


Y 
TAT 


H 
CAT 


C 
CCC 


G 
GGC 


Y 
TAC 


242 




91 1 


L 
CTT 


P 

CCC 


F 
TTT 


R 
CCA 


D 
GAC 


P 
CCT 


T 
ACT 


I 

ATC 


D 

CAC 


GAA 


N 
AAC 


s 

ACC 


N 
AAT 


★ 

D 

GAC 


I 

ATT 


A 
CCC 


L 

CTG 


V 
CTC 


H 
CAC 


L 
CTC 


S 
TCT 


S 
ACC 


364 


977 


S 
TCC 


L 

ore 


P 

CCT 


L 
CTC 


T 
ACA 


E 
GAA 


Y 
TAC 


I 

ATC 


0 
CAC 


P 
CCC 


V 
CTT 


c 

TGT 


L 
CTC 


P 
CCT 


A 

CCT 


A 

CCC 


G 
GGA 


0 
CAG 


A 

CCC 


L 
CTG 


V 
CTG 


D 
GAC 


286 


i 

{ 


1043 


G 
GGC 


K 
AAC 


V 
GTC 


C 
TCT 


T 
ACA 


V 
GTC 


T 
ACC 


C 
CCC 


W 
TCC 


c 

GGT 


N 
AAC 


T 
ACA 


0 
CAG 


F 
TTC 


Y 
TAT 


G 
GGC 


Q 
CAG 


0 
CAA 


A 
CCT 


V 
CTG 


V 
CTC 


L 
CTC 


308 


1 

< 


1109 


0 
CAA 


E 
GAG 


A 
CCC 


R 
CCC 


V 
CTC 


P 

CCC 


I 

ATC 


I 

ATA 


S 
ACC 


N 
AAC 


E 
GAA 


V 
GTT 


C 
TCC 


N 
AAC 


S 
ACC 


P 

CCC 


0 

GAC 


F 
TTC 


Y 
TAC 


C 
GCG 


N 
AAT 


0 

CAG 

★ 

S 

ACC 


330 


K 


1175 


I 

ATC 


K 
AAA 


P 

CCC 


K 
AAC 


H 

ATC 


F 
TTC 


C 
TCT 


A 
CCT 


c 

CCC 


Y 
TAT 


P 

CCT 


E 
GAC 


c- 

CCT 


C 
CCT 


X 

ATT 


D 
CAT 


A 

CCA 


C 
TGC 


o 

CAC 


G 
CCT 


D 
GAC 


352 


\ 


124 1 


G 
GGA 


C 
CCC 


H 
CAC 


F 
TTT 


V 
CTA 


C 
TCT 


E 
GAC 


D 

CAC 


R 
AGA 


I 

ATC 


S 
TCT 


C 
GGA 


T 
ACA 


s 

TCA 


R 
AGA 


W 

TCC 


R 
CCC 


L 
CTC 


C 
TCC 


C 

CCC 


Z 

ATT 


V 
CTA 


374 


N 


1307 


S 
ACC 


w 

TCC 


C 
CCT 


T 

ACC 


C 
CCC 


C 
TCT 


A 
CCT 


L 
TTC 


A 
CCC 


R 
CCC 


K 
AAC 


P 
CCC 


C 
CCA 


V 
CTC 


Y 
TAC 


T 
ACC 


K 
AAA 


V ■ 
GTC 


I 

ATT 


D 
CAC 


F 
TTC 


R 

CCC 


396 


■% 


137 3 


E 
CAC 


W 
TCC 


I 

ATC 


F 
TTC 


Q 
CAC 


A 

CCC 


I 

ATA 


K 
AAC 


T 
ACT 


H 
CAC 


S 
TCC 


E 
GAA 


A 

CCT 


T 
ACC 


G 
GGC 


M 

ATC 


V 
CTA 


T 
ACT 


Q 
CAG 


P 
CCC 


Stop 
TCA CCC 


416 



14 3 9 CCCCTCATCCCCTCCTCCGCCCTCCTCCAOCATCCACAGTCACACTTCCTCTCGTCCCTCCACCCCCACCTCCCAGCCTC^ 
1526 CCCCTCAC ATCCAACCGTTTTCTGCTCGC ATCCACTCCATAGATCCAACGATCCTCGGTCCAACGACCTCTCT^ 
1613 CCC ACTC AATCCC ACCGCC ATTCGCCTC ACCCTCCC ACCCC ATC T AAAT ATTACTCTC TCCTCTCCGGCCTCCTTTCC ACCCCCCCC 
1700 TTCTCCCC ATCCTCTTTAAATAATAAACCTCCTTTTCATT 

Fig. I. cDNA sequence and predicted amino-acid sequence of rat hepsin. Nucleotides are numbered at left and amino-acid residues 
predicted transmembrane domain is underlined and ( v) represents the proposed zymogen activation cleavage site. The catalytic 

starred and a potential AT-linked glycosylation site is indicated by (•}. 



at right. The 
residues are 



352 



Hub 

Hua 

IL6C 
Hun 

RAt 

KUIB 

Huia 

Hum 

Rbc 
Hunt 



MMCBCCRTAfCC5RPKVAAX.TWCTUATCTGAASVaaVnUA5I»ePLY0V^^ (0 

KAO V A L..A AV P. -V.SA.A 41 

>UXVU)irmrn<RIJgSSRSKAJWACLOCEgWCrUUUJWSEIJ^ 1)0 

•M.'^ fi •* T 131 

IXX;LPLAQRlJXVISVCDCPfU3M''LTATCODOC3UUU.P^mRXVC^^ ttO 

..R..BT ,B A.I B.T iai 

yixrmt<oc5ixsciK^TAAKcrpaootvi.si(ioivpACAVAirrsp^ 240 

OA. ..CI. V.,. 3*t 

OYI,^F1^)PTIO^3<SNDZAL^^It^S5LPLTCTI0F^KXI■AAC0iU:VDGXVCT^ iOO 
MSB P • X y ,01 

yCQOAVVLOSARVPIISKt^'OtSPIinrCHOXKmCPCACYPECOIDAC^^ j»0 
C O...CA p 1,1 



RXS(7rSKWRljCCXVSMrrCCAIMKP(7/riXVIDnt£WZP0Air^ 

S. .R.P O S S L 



«ii 

417 



Fig. 2, Comparison or the deduced amtno-acid sequences of rat and 
human hepstn. Residues in the human sequence that are identical 10 
those of the rat are represented by a single dot and differences arc 

indicated. 



teinases, one can predict that the two conserved cys- 
teine residues at positions 152 and 276 are involved in 
a disulfide linkage between the noncatalytic and cat- 
alytic chains of hepsin. Many interesting similarities of 
hepsin to other serine proteinases have already been 
considered by Leytus et al. [1] in their description of 
human hepsin. 

Proteinases are involved in many biological pro- 
cesses such as blood coagulation, fibrinolysis and com- 
plement activation [7]. However, the biological role of 
hepsin remains unclear since its enzymatic specificity 



and physiological substrates are presently unknown. 
Analysis of the amino-acid sequence of hepsin reveals 
several key residues which are similar to trypsin espe- 
cially in the highly conserved sequences which sur- 
round the catalytic site. Although substrate specificity 
is unknown, the presence of an Asp at position 346 
would suggest that hepsin exhibits trypsin like activity 
since a similar residue is found in trypsin at the bottom 
of the substrate binding pocket [8], The precise role of 
this enzyme will remain a subject of speculation until 
the native enzyme can be purified and further charac- 
terized. 



References 

1 Leytus. S.P., Loeb, K.R.. Hagen, F.S.. Kurachi. K. and Davie. 
E.W. (1988) Biochemistry 27, 1067-1074. 

2 Tsuji. A., Torres-Rosado, A., Arai. T., LcBeau, M.M.. Lemons, 
R.S., Chou. S.H. and Kurachi. K. <W1) J. Biol. Chem. 266, 
16948-16953- 

3 Tsuji, A., Torrcs-Rosado, A., Arai, T., Chou, S.H. and Kurachi, K. 
(1991) Biomed. Biochim. Acta 50, 791-793. 

4 Ordman. A., Farley, D., Meyhack, B. and Nick, H. (1991) J. 
Steroid Biochem. Mol. Biol. 39, 487-492. 

5 Sanger, F., Nicklcn, S, and Cdulson, A.R. (1977) Proc. Natl. Acad. 
Sci. USA 74, 5463-5467. 

6 Hartmann, E., Rapoport. T.A. and Lodlsh, H.F. (1989) Proc. Nail. 
Acad. Sci. USA 86, 5786-5790. 

7 Neuralh. H. (1986) J. Cell Biochem. 32. 35-49. 

8 Stroud, R.M., Kay, UM. and Dickcrson, R.E. (1974) J. Mol. Biol. 
83, 1 85-208. 



Bioi 
O 1 



BB. 



A • 
api 
elc 
rat 
sug 



sue 

vol 

be. 

ne; 

cle 

of 

mc 

chl 

ep 

Ph 

ne 

ex I 

lar 

ch: 

tiS: 
Spi 



cE 
mc 
cC 
pa 
Pr 



Co 
na) 
ma 
Th 
sul 
aci 




Exhibit 1 4 



EuK y. Biochenu 267, 693 1 -6937 (2000) © FEBS 2000 



Localization of the mosaic transmembrane serine protease corin to 
heart myocytes 

John D. Hooper\ Anthony L Scarman\ Belinda E. Clarke^ John F. Normyle^ and Toni M. Antalis^ 

^Cellular Oncology Laboratory, Queensland Institute of Medical Research, Brisbane, Queensland, Australia; 
^Department of Anatomical Pathology, The Prince Charles Hospital, Chermside, Queensland, Australia 



Corin cDNA encodes an unusual mosaic type II iransmembrane serine protease, which possesses, in addition to a 
trypsin-like serine protease domain, two frizzled domains, eight low-density lipoprotein (LDL) receptor domains, 
a scavenger receptor domain, as well as an intracellular cytoplasmic domain. In in vitro experiments, recombinant 
human corin has recently been shown to activate pro-atrial natriuretic peptide (ANP), a cardiac hormone essential 
for the regulation of blood pressure. Here we report the first characterization of corin protein expression in heart 
tissue. We generated antibodies to two different peptides derived from unique regions of the corin polypeptide, 
which detected immunoreactive corin protein of approximately 125-135 kDa in lysales from human heart 
tissues. Immunostaining of sections of human heart showed corin expression was specifically localized to the 
cross striations of cardiac myocytes, with a pattern of expression consistent with an integral membrane 
localization. Corin was not detected in sections of skeletal or smooth muscle. Corin has been suggested to be a 
candidate gene for the rare congenital heart disease, total anomalous pulmonary venous return (TAPVR) as the 
corin gene colocalizes to the TAPVR locus on human chromosome 4. However examination of corin protein 
expression in TAPVR heart tissue did not show evidence of abnormal corin expression. The demonstrated corin 
protein expression by heart myocytes supports its proposed role as the pro- ANP convertase, and thus a potentially 
critical mediator of major cardiovascular diseases including hypertension and congestive heart failure. 

Keywords: serine protease; corin; heart; pro-atrial natriuretic peptide (pro- ANP); TAPVR. 



Serine proteases are found in all living organisms, ranging from 
viruses to humans [1], where they serve important and varied 
biological functions in situations requiring limited proteolysis. 
Their activities impact on areas as diverse as hemostasis, tissue 
remodelling and wound repair, inflammation, angiogenesis, 
fibrinogenesis and fibrinolysis. Cell surface serine proteases 
have been associated largely with extracellular matrix degra- 
dation, but there are emerging roles for these proteases in 
generating bioactive matrix protein fragments, influencing the 
release, the activation and bioavailability of growth factors and 
in shedding of cell surface proteins [2—6]. 

Many serine proteases are mosaic proteins comprising 
multiple, structurally distinct domains necessary for regulating 
enzymatic activity. Circulating serine proteases of the blood 
coagulation (e.g. prothrombin and factor X) [7], fibrinolysis 
(e.g. plasminogen activators) [8] and complement (e.g. Clr and 
Cls) [9] systems are well characterized examples of mosaic 
proteins. While the vast majority of known serine proteases are 
secreted, more recently some serine proteases have been found 
to possess integral transmembrane domains. The proteins 
enteropeptidase [10], hepsin [11] and most recently, TMPRSS2 

Correspondence toT. M. Antalis, Queensland Institute of Medical 
Research, Post Office Royal Brisbane Hospital, Brisbane, 4029, 
Queensland, Australia. Fax: + 61 73362 0107, Tel.: + 61 73362 0312, 
E-mail: toniA@qimr.edu.au 

Abbreviations: LDL, low-density lipoprotein; ANP, atrial natriuretic 
peptide; TAPVR, total anomalous pulmonary venous return; tPA, 
tissue-type plasminogen activator; uPA, urokinase-type plasminogen 
activator; ang, angiotensin; ACE, angiotensin converting enzyme. 
(Received 24 July 2000, revised 12 September 2000, accepted 
4 October 2000) 



[12] are examples of mosaic serine proteases with type II 
transmembrane domains. These enzymes are positioned on the 
plasma membrane via a membrane spanning domain close to 
the N-terminus. In addition to membrane spanning and protease 
domains, enteropeptidase also contains two low-density lipo- 
protein (LDL) receptor domains, a meprin-like domain, two 
Clr-like domains and a truncated scavenger receptor domain. 
An LDL receptor domain and a scavenger receptor domain 
have also been identified in TMPRSS2 [12]. The functions of 
these domains have not been determined. 

Serine proteases play important roles in several aspects of 
heart physiology and cardiovascular disease [13]. The mast cell 
serine protease chymase is believed to be the major converter of 
angiotensin (ang)I to angll in human heart tissue [14]. The 
involvement of angll in normal cardiac function as well as in 
heart ailments such as hypertrophy, heart failure and ischaemic 
heart disease is indicated by the finding that inhibition of the 
angiotensin converting enzyme (ACE), leads to beneficial 
outcomes for sufferers of these diseases [15]. However, ACE 
inhibitors block only 10-20% of angi conversion in heart tissue 
whereas the remaining activity is blocked by serine protease 
inhibitors [16]. The fibrinolytic serine proteases tissue-type 
plasminogen activator (tPA) and urokinase-type plasminogen 
activator (uPA) are also thought to be involved in the 
progression of heart disease. uPA is present at significantly 
elevated levels in the atherosclerotic lesions responsible for 
myocardial infarction and failure [17]. The reduction in tPA 
from arteriolar smooth muscle cells is linked to the develop- 
ment of coronary artery disease in transplanted hearts [18]. 

Our own work and that of Yan et al. [19] has led to the recent 
cloning of a cDNA encoding a novel, multidomain type II 
transmembrane serine protease from human heart. The 



6932 J. D. Hooper et ai {Eur. J. Biochem. 267) 



© FEBS 2000 



predicted protein, corin, comprises two frizzled domains, eight 
LDL receptor domains, a truncated scavenger receptor domain, 
in addition to the extracellular trypsin-like serine protease 
domain [19]. Recent expression of recombinant corin demon- 
strates that it possesses pro-atrial naturitic peptide (ANP) 
convertase activity [20], and thus may play a critical role in the 
regulation of hypertension. In situ hybridization studies of 
mouse embryonic heart showed that corin mRNA was 
expressed as early as day 9.5 and maintained its expression 
through the adult animal [19]. The corin gene was mapped to 
human chromosome 4pl2— 13 [19], near the locus for the 
congenital heart disease, total anomalous pulmonary venous 
return (TAPVR). Here we present data describing for the first 
lime native corin protein expression and localization in human 
heart. 

MATERIALS AND METHODS 

Identification of corin cDNA by homology cloning 

Homology cloning was performed by RT-PCR using degenerate 
oligonucleotides corresponding to conserved regions of serine 
proteases [21-24]. Total RNA was isolated from SI a cells [25] 
following treatment with TNFa and cycloheximide for 4 h. 
RNA (5 \kg) was reverse transcribed at 42 °C using AMV 
reverse transcriptase (Promega, Madison,WI) in the presence of 
oligo dTi2-i8 (0.25 ^-g fiL~*) (Pharmacia Biotech, Sweden), 
50 mM Tris/HCI, pH 8.3, 50 mM KCl, 10 mM MgClj, 10 mM 
dithiothreitol and 0.5 mM spermidine in a total volume of 
20 |xL. PCR was performed using 1 jxL of the reverse 
transcriptase reaction mixture, 500 ng of each primer, 10 mM 
Tris HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCIa, 0.2 mM dNTPs 
and 1— 2units of Taq polymerase (Perkin Elmer). The primers 
were as follows. Forward, 5'-ACAGAATTCTGGGTIGTIACI- 
GCIGCICAYTG-3'; reverse, 5'-ACAGAATTCAXIGGICCI- 
CCI(C/G)(T/A)XTCICC-3'; where X = A or G, Y = C or T; 
I = inosine). 

Cycling conditions: 2 cycles of 94 for 2.5 min, 35 °C for 
2.5 min and 72 °C for 3 min, followed by 33 cycles of 94 °C 
for 2.5 min, 57 **C for 2.5 min and 72 '^C for 3 min, with a final 
extension at 72 °C for 7 min. PCR products of approximately 
450 bp were ligated into pGEM-T (Promega. Madison, WI, 
USA), cloned and analysed by DNA sequencing. A DNA 
fragment was identified which represented the partial corin 
sequence (nucleotides 334-748). The cDNA was extended 333 
nucleotides towards the 5' end by screening a cDNA library 
using two rounds of PCR and the nested oligonucleotides 
ATC2P3 and ATC2P1 in combination with the vector specific 
primer T7. The V end was extended to nucleotide 976 by two 
rounds of PCR and the nested oligonucleotides ATC2P4 and 
ATC2P5 in combination with the vector specific primer T3. The 
primer sequences are given below. 

ATC2P1: 5'-GCGTGTCTGCATGAACACTG-3'; ATC2P2: 
5'-ATGCCAAGCACCACTTTCCA-3'; ATC2P3: 5'-ATAGTC- 
CACCACTGCTCGAC-3'; ATC2P4: S'-TTAAGCTGCAAGA- 
GGGAGAG-3'. 

The DNA sequence of this cDNA has been deposited in 
the DDBJ/Genbank/EMBL database under accession no. 
AFl 13248. 

Heart tissue specimens 

Tissues from explanled hearts with terminal heart failure were 
either snap frozen in liquid nitrogen (for RNA and protein 
analyses) or processed for routine histological examination. Six 



paraffin embedded blocks of human heart tissue were obtained 
from autopsy cases with acute myocardial infarction. These 
blocks included both viable and nonviable myocardium. 
Procedures were in accordance with guidelines established by 
the National Health and Medical Research Council of Australia, 
Ethics Approval number EC9876(n). 

Northern and Poly(A)'^ RNA dot blot analyses 

Human multiple tissue northern blots (Clontech, Palo Alto, CA, 
USA) contained 2 fxg of poly(A)^ RNA per lane. The blots 
were hybridized with a "'^P-dCTP labeled EcoKl digested DNA 
fragment encoding corin cDNA in ExpressHyb (Clontech) 
solution at 65 **C and washed to a final stringency of 
0.2 X NaCI/Cit, 0.1% SDS at 65 °C. The blot was reprobed 
with p-actin as a measure of loading in each lane. For the 
mouse tissue blot, total RNA was purified from mouse tissues, 
separated by denaturing gel electrophoresis and transferred to 
Hybond-N nylon membranes as described [26]. The blot was 
hybridized with the radiolabelled human corin DNA probe 
under lower stringency conditions in ExpressHyb solution at 
55 °Q and washed to a final stringency of 1 x NaCl/Cit, 0.1% 
SDS at 55 °C. The mouse tissue blot was stained with ethidium 
bromide to confirm RNA loading in each lane. 

Production of affinity purified antipeptide polyclonal 
antibodies 

Rabbit polyclonal antibodies were generated against corin 
specific peptides derived from nonhomologous hydrophilic 
regions within the corin amino-acid sequence. Two peptides, 
each containing a cysteine residue incorporated at the C-terminus, 
were synthesized (Auspep, Parkville, Australia) and conjugated 
to keyhole limpet hemocyanin using |x-maleimidobenzoic acid 
A^-hydroxysuccinimide ester. The peptides were: Al: IQEQE- 
KEPRWLTLHSNWE-C, A2: GHMGNKMPFKLQEGE-C. 
Rabbit antisera was peptide-affinity purified using SulfoLink 
coupling gel (Pierce, Rockville, IL). The specificity of each 
antibody was tested against the immunogenic peptide by 
ELISA. 

Western blot analysis 

Frozen heart tissue (100 mg) was homogenized in lysis-binding 
buffer (Dynabeads mRNA Direct kit, Dynal) and spun at 
13000xg for 2 min. The protein pellet was dissolved in 
reducing SDS-sample buffer for Western blot analysis. Proteins 
were separated by SDS/PAGE on 10% acrylamide gels and 
transferred electrophoretically to Hybond-P membranes 
(Amersham, Aylesbury, UK). Membranes were blocked with 
5% nonfat skim milk powder in Tris/NaCI (10 mM Tris/HCI, 
pH 7.0, 150 mM NaCI), incubated with affinity purified anti- 
peptide antibody, then with horseradish peroxidase conjugated 
sheep anti-(rabbit Ig) secondary antibody, and visualized by 
enhanced chemiluminescence (Amersham, Aylesbury, UK). 

Immunohistochemistry 

Paraffin sections (5 |xm) of formalin-fixed human heart were 
deparaffinized, then rehydrated before antigen retrieval in 
boiling 10 mM citric acid buffer, pH 6. After cooling, 
endogenous peroxidase activity was inhibited by lOmin 
incubation in 1% hydrogen peroxide. Non-specific antibody 
binding was blocked by incubating the sections in 4% nonfat 
skim milk powder in NaCl/Pj for 15 min, followed by 10% 



© FEBS 2000 



Conn is expressed by human myocytes (Euk J. Biochem. 267) 6933 



Fig. 1. Corin expression in human and 
mouse tissues. (A) Northern blot analysis of 
RNA isolated from a range of normal human 
tissues probed with ^^P-labelled corin cDNA. 
The levels of p-actin mRNA are shown as a 
control for loading. (B) Northern blot analysis 
of corin mRNA expression in a range of mouse 
tissues probed with **^P- label led human corin 
cDNA at reduced stringency. The levels of 
18S ribosomal RNA are shown as a control 
for loading. 



St 



^ ^ ^ O ^ ^ -& ^ 



/ 



7.5 



4.4 - 



0-actin - 




- Human Corin 



^j? # < 

C}- 4^ C$t 




-Mouse Corin 



normal goat serum for 20min. Affinity purified anticorin Al 



(1 : 100; 150 »jig-mL"*) or A2 antibodies (1 : 50; 
20 |xg*mL~') were applied and incubated overnight in a 
humidified chamber at room temperature. Controls included 
sections incubated with no primary antibody or antibody that 
had been preadsorbed for 2 h at room temperature with 1 pig of 
the antigenic peptide. Following incubation with prediluted 
biotinylated goat anti-(rabbit Ig) Ig (Zymed, San Francisco, 
CA, USA), streptavidin— horseradish peroxidase (Zymed) was 
applied and color developed using the chromogen 3,3'-diamino- 
benzidine with hydrogen peroxide as substrate. The sections 
were counterstained in Mayers' haematoxylin. 



RESULTS AND DISCUSSION 

Isolation of human corin cDNA by homology cloning 

A PCR-based homology cloning approach was employed to 
identify serine protease cDNAs expressed by the Sla cell line 
[25] which is resistant to tumor necrosis factor-a induced 
apoptosis. Degenerate primers designed to anneal to cDNA 
encoding the conserved regions surrounding the catalytic 
histidine and serine amino acids of serine proteases [21-23], 
were used to amplify and then clone a range of DNA fragments 
of approximately 450 bp. One clone, designated ATC2, was 
found to encode a novel serine protease. The cDNA was 
extended in the 5' and 3' directions by library screening and the 
DNA sequence was deposited in the DDBJ/Genbank/EMBL 
database (accession no. AF 113248). This sequence was 
subsequently determined to be 100% identical to a recently 
reported cDNA encoding the serine protease, corin (accession 
no. AF133845) [19]. 



Corin mRNA is strongly expressed in heart 

The tissue distribution of corin mRNA was examined by 
Northern blot analyses. Analysis of poly(A)'^ RNA from 16 



normal human tissues showed a single transcript of approxi- 
mately 5.1kb detectable only in human heart (Fig. lA). 
Examination of a range of mouse tissues also demonstrated 
specific expression of corin mRNA of approximately 5.1kb 
only in mouse heart (Fig. IB). 



Corin - 




Fig. 2, Corin protein expression in human heart tissue by Western blot 
analysis. Immunoreactive corin protein of 125-135 kDa is detected in a 
protein lysale prepared from human heart tissue (Patient #7684), which is 
not detectable in a corin negative HeL^ cell lysate. The blot was probed 
with anticorin antibody, AbAl, and visualized using enhanced chemilumi- 
nescence. The protein standards in kDa are as indicated. 



6934 J. D. Hooper et ai (Eur. J, Biochem, 267) 



© FEES 2000 




Fig. 3. Corin is localized to human heart myocytes by immunostaining. Immunohistochemical staining of human heart tissues was perfomied using the 
affinity purified anticorin peptide Al or A2 polyclonal antibodies as primary antibodies. (A) a longitudinal section of a representative heart tissue from a 
transplant recipient (Patient #7684) stained with AbAl showing intense staining in the cardiac myocytes; (B) as (A) except the primary antibody was 
preadsorbed with the immunogenic peptide, Al, for 2 h; (C) the same tissue as (A) except stained with the weaker staining antibody, AbA2. Apparent 
staining at the poles of the nuclei are deposits of the brown lipochrome pigment, lipofuscin. (D) the same tissue as (A-C) processed in the absence of primary 
antibody; (E) a longitudinal section of normal myocardium from a heart which contained an acute infarct elsewhere (Patient #A4-99R) stained with AbAl 
showing intense staining corresponding to the cross striations; (F) staining of the same heart tissue as (E) with AbAl showing intense staining in cross 
section. Photomicrographs (A— E) were taken at an original magnification of lOOx. 



Anti-corin antibodies detect corin in heart lysates 

We generated polyclonal antibodies to two different peptides 
derived from unique regions of the corin polypeptide 
sequence in order to investigate its expression and localization 
in the heart. The first was a unique region within the serine 
protease catalytic domain between the conserved Asp and Ser 



amino-acid residues (AbAl) and the second was contained 
within the scavenger receptor domain (AbA2). Immunoblot 
analysis of corin protein expression in human heart protein 
lysates showed a major immunoreactive band of 125-135 kDa 
(Fig. 2), which was not present in lysates from the negative 
control HeLa cell line. This molecular mass is slightly lower 
than that reported («= 150 kDa) for recombinant V5/His6 



© FEBS 2000 



Conn is expressed by human myocytes {Eur. J. Biochem. 267) 6935 









« 

■0^ 



0 




Fig, 4. Corin expression in neonate heart with TAPVR. Immunohistochemical staining of human neonate heart tissues was performed using the affinity 
purified anticorin peptide Al polyclonal antibody as the primary antibody. (A) and (C) longitudinal sections of TAPVR heart tissue showing staining in the 
cardiac myocytes, corresponding to the cross striations; (B) and (D) longitudinal sections of a normal neonate heart showing a similar staining pattern in the 
cardiac myocytes. Photomicrographs (A) and (B) were taken at an original magnification of lOOx and (C) and (D) were taken at an original magnification of 
40x. 



tagged corin expressed by human embryonic kidney 293 cells 
[20]. As the mature corin zymogen has a calculated mass of 
116kDa [19], it is likely lhai the mature corin polypeptide 
undergoes a post-translational processing event, possibly 
glycosylation. Consistent with this, there are 19 predicted 
N-linked glycosylation sites present in the extracellular 
domains of corin [19]. 



Corin is expressed by human heart myocytes 

To investigate the localization of corin expression in human 
heart, immunohistochemical analyses were performed on 
human adult heart tissues. Corin was abundantly expressed 
in cardiac myocytes, with intense brown staining associated 
with cross striations seen in longitudinally sectioned myofibers 
(Fig. 3A). In some areas there was accentuation of the plasma 
membrane, consistent with an integral membrane localization 
of corin. This same pattern of staining was observed in sections 
taken from all areas of the myocardium. Control slides using 
the AbAl polyclonal antibody in the presence of competing 
Al peptide showed absence of this specific staining pattern 
(Fig. 3B). An identical, albeit weaker staining pattern was 
observed in experiments performed using the second corin- 
specific antibody (AbA2) (Fig. 3C). No staining was detected 
in the absence of antibody (Fig. 3D). Staining of a section of 



viable myocardium from a heart containing an acute myocar- 
dial infarct showed a similar intense staining of the striations 
in cardiac myocytes (Fig, 3E) and a pinhead-like dot pattern 
when viewed in cross section (Fig. 3F). Necrotic heart tissue 
showed similar but much less intense staining (data not shown). 
Corin was not detected in sections of skeletal or smooth muscle 
(data not shown), suggesting that the function of corin is 
specifically related to cardiac muscle. 



Corin protein expression in a patient with the congenita! 
heart disease, TAPVR 

The molecular mechanisms responsible for the developmental 
defect associated with the rare congenital heart disease TAPVR 
are not known. The location of the corin gene on human 
chromosome 4pl2-13 [19] and the localization of the TAPVR 
locus to a 30 centimorgan interval on 4pl3-ql2 [26], suggested 
that corin may be a candidate for the TAPVR gene [19]. If corin 
plays a role in TAPVR, its expression may be lost or altered in 
TAPVR heart tissue. To explore this possibility, we examined 
corin protein expression in a TAPVR heart. The pattern of corin 
expression detected in this heart tissue (Fig. 4A,C) was similar 
to that observed in the adult heart and was identical to the 
pattern of corin staining in an age-matched neonate control 
heart (Fig. 4B,D). While this data is not consistent with a role 



6936 J. D. Hooper et al {Eur. J. Biochem. 267) 



© FEBS 2000 



Corin 



Transmembrane Serine Proteinase 

Domain Truncated Scavenger Domain 

LDL-receptor-IIke Frtnled Receptor Cysteine^ich r\ r\ 

Domains Domain Domain 9?^ V ^ 



HN 



i 

-imami 




Meprin-lilte Cir-ltke 
Domain Domain 



Enteropeptidase ".n-iii-H-^^ 



1 ■ ■ •ii . f.. 



TIVIPRSS2 "."-11- 



Hepsin hm-I 





I 



COH 



^ ^ ^ 



—00^ 



^ ^ ^ 



1 — COH 



s 



^ ^ 



I 



—COH 



Fig. 5. Diagram showing domain structures of corin compared witti other mosaic integral membrane proteins. The domains are as indicated. The 
catalytic serine protease residues are circled. The disulfide bond linking catalytic and pro-regions are marked. 



for corin in TAPVR, it does not exclude the possibility that 
TAPVR is associated with more subtle alterations to the corin 
gene; for example point mutations, that would not be detected 
by this method. 

Corin homology to other type II transmembrane proteases 

As illustrated in Fig. 5, corin is a mosaic integral membrane 
protein possessing discrete domains. The intracellular, cyto- 
plasmic domain contains two potential protein kinase C phos- 
phorylation sites which may represent mechanisms for signal 
relay to or from the cell surface. Corin contains two frizzled 
domains. These domains function in other molecules as 
receptors for Wnl proteins, which are implicated in signal 
transduction during development [28]. Corin possesses eight 
LDL receptor domains which can mediate uptake of LDLs [29] 
and have also been shown to be involved in binding and 
intemalization of protease/inhibitor complexes [30], LDLs 
regulate the transport of cholesterol and play a major role in 
the development of heart disease. Corin possesses a scavenger 
receptor domain, which in other proteins, binds polyanionic 
molecules including modified lipoproteins, cell surface lipids 
and some sulfated polysaccharides [31]. The trypsin-like serine 
protease domain is located at the C-terminus. 

Corin bears similarity to other known members of the 
integral membrane serine proteases as illustrated in Fig. 5. The 
corin serine protease domain is highly homologous to a 
multidomain integral-membrane serine protease found in the 
brush border of the intestine, enteropeptidase [32]. Entero- 
peptidase functions to activate digestive pancreatic enzymes 
released from the intestine. Activation of this cascade is critical, 
as illustrated by the life-threatening intestinal malabsorption 
that accompanies congenital deficiency of enteropeptidase [32]. 
Other proteases with homology to the corin serine protease 
domain are the integral-membrane serine proteases, TMPRSS2 
and hepsin. Hepsin is a hepatic serine protease that has been 
demonstrated to activate Factor VII in the extrinsic blood 
coagulation pathway leading to thrombin formation, and has 
further been shown to be required for mammalian cell growth 
[33]. 

In summary, we have confirmed heart as a site of abundant 
corin mRNA expression and demonstrated for the first time the 
expression of corin as a 125-135 kDa protein in this tissue. In 



addition, in heart we have localized corin protein to myocytes; 
the same cardiac cells expressing pro-ANP. These data support 
recently reported in vitro evidence that the corin proteolytic 
domain is the pro-ANP convertase [20] and thus, the proposal 
that corin has a role in regulating blood pressure. Possible 
additional functions of the serine protease domain and the 
functions of the other corin domains are not yet known. The 
putative phosphorylation sites in the cytoplasmic domain of 
corin may indicate that the intracellular domain of corin will be 
a target for phosphorylation and therefore may mediate 
signalling events from the cell surface. A belter understanding 
of the role of corin in heart will provide insight into basic 
molecular mechanisms of cardiac function and could provide a 
rational target for both diagnostic and therapeutic applications. 

ACKNOWLEDGEMENTS 

This work was supported by grants from the Queensland Cancer Fund^ 
Brisbane, Australia and the National Health and Medical Research Council 
of Australia. J. D. H. was supported by a John Eamshaw Scholarship from 
the Queensland Cancer Fund and by the Bancroft Scholarship, Queensland 
Institute of Medical Research. 

REFERENCES 

1. Rawlings, N.D. & Barrett, A.J. (1994) Families of serine peptidases. 

Methods Enzymol. 244, 19-61. 

2. Murphy, G. & Gavrilovic, J. (1999) Proteolysis and cell migration: 

creating a path? Curr, Opin. Cell Biol. 11, 614-621. 

3. LeMosy, E.K., Hong, C.C. & Hashimoto. C. (1999) Signal transduction 

by a protease cascade. Trends Cell Biol, 9, 102-107. 

4. Rifkin, D.B., Mazzieri, R., Munger, J.S., Noguera, I. & Sung, J. (1999) 

Proteolytic control of growth factor availability. Acta Path, Microbiol, 
Immunol. Scand. 107, 80-85. 

5. Dery, O. & Bunneti, N.W. (1999) Proteinase-activated receptors: a 

growing family of heptahelical receptors for thrombin, and tryptase. 
Biochem, Soc. Trans. 27, 246-254. 

6. Noel, A., Gilles, C, Bajou, K., Devy, L., Kebers, F, Lewalle. J.M., 

Maquoi, E., Munaut, C, Remade, A. & d Foidart, J.M. (1997) 
Emerging roles for proteinases in cancer. Invasion Metastasis 17, 
221-239. 

7. Ichinose, A. & Davie, E.W. (1994) The Blood Coagulation Factors: 

Their cDNAs, Genes, and Expression. In Hemostasis and 
Thrombosis: Basic Principles and Clinical Practice (Colman, R.W., 



© FEBS 2000 



Conn is expressed by human myocytes (Eur. J. Biochem. 267) 6937 



Htrsh, J.. Marder VJ. & Salzman. E.W., cds), pp. !9-54. J.B. 
Lippincott Company, Philadelphia, PA, USA, 

8. Francis, C.W. & Marder, V.J. (1994) Physiologic Regulation and 

Pathologic Disorders of Fibrinolysis. In Hemostasis and Thrombosis: 
Basic Principles and Clinical Practice (Col man, R.W., Hirsh, J., 
Marder V.J. & Salzman, eds), pp. 1076-1103. J.B. Lippincott 

Company, Philadelphia, PA, USA. 

9. Arlaud, G.J. & Thielens, N.M. (1993) Human complement serine 

proteases Clr and Cls and their proenzymes. Methods Enzymol. 223, 
61-82. 

10. Kiiamoto, Y.. Veile, R.A., Donis-Keller, H. & Sadler, J.E. (1995) 

Human complement serine proteases Clr and Cls and their 
proenzymes. Biochemistry 34, 4562-4568. 

1 1. Tsuji, A., Torres- Rosado, A., Arai, T., Le Beau, M.M., Lemons, R.S., 

Chou, S.H. & Kurachi, K. (1991) Hepsin, a cell membrane-associated 
protease. Characterization, tissue distribution, and gene localization. 
J. Biol. Chenu 266, 16948-16953. 

12. Paoloni-Giacobino, A., Chen, H., Peitsch, M.C., Rossier, C. & 

Anionarakis, S.E. (1997) Cloning of the TMPRSS2 gene, which 
encodes a novel serine protease with transmembrane, LDLRA, and 
SRCR domains and maps to 21q22.3. Genomics 44, 309-320. 

13. Schussheim, A.E. & Fuster, V. (1997) Thrombosis, antithrombotic 

agents, and the antithrombotic approach in cardiac disease. Prog. 
Cardiovascular Diseases 40, 205-238. 

14. Balcells, E., Meng, Q.C., Johnson, W.H. Jr, Oparii. S. & DeH'Italia, 

L.J. (1997) Angiotensin II formation from ACE and chymase in 
human and animal hearts: methods and species considerations. Am, J. 
PhysioL 273, H1769-H1774. 

15. Wolny, A., Clozel. J.R, Rein, J., Mory, R, Vogt, P, Turino, M., 

Kiowski, W. & Fischli, W. (1997) Functional and biochemical 
analysis of angiotensin ii-forming pathways in the human heart. 
Circ. Res. 80, 219-227. 

16. Bumpus, F.M. (1991) Angiotensin I and 11. Some early observations 

made at the Cleveland Clinic Foundation and recent discoveries 
relative to angiotensin II Formation in human heart. Hypertension 18, 
122-125. 

17. Kicnast, J., Padro, T, Steins, M., Li, CX., Schmid, K.W„ Hammel, D.. 

Scheld, H.H. & Van De Loo, J.C. (1998) Relation of urokinase- type 
plasminogen activator expression to presence and severity of 
atherosclerotic lesions in human coronary arteries. Thromb. Haemost. 
79, 579-586. 

18. Labarrere, C.A., Pitts, D., Nelson. D.R. & Faulk, W.R (1995) Vascular 

tissue plasminogen activator and the development of coronary 
artery disease in heart-transplant recipients. A^. Engl. J. Med. 333, 
1111-1116. 

19. Yan, W., Sheng, N., Seto, M., Morser, J. & Wu, Q. (1999) Conn, a 

mosaic tranmembrane serine protease encoded by a novel cDNA 
from human heart. J. Biol. Chem. 274, 14926-14935. 

20. Yan, W., Wu, F, Morser, J. & Wu, Q. (2000) Corin, a transmembrane 

cardiac serine protease, acts as a pro-atrial natiuretic peptide- 
converting enzyme. Proc. Natl Acad. Sci, USA 97, 8525-8529. 



21. Sakanari, J.A., Staunton, C.E., Eakin, A.E., Craik, C.S. & McKerrow, 

J.H. (1989) Serine proteases from nematode and protozoan parasites: 
isolation of sequence homologs using generic molecular probes. 
Proc. Natl Acad. Sci. USA 86, 4863-4867. 

22. Elvin, CM., Whan, V. & Riddles. RW. (1993) A family of serine 

protease genes expressed in adult buffalo fly {Haematobia irritans 
exigua). Mol. Gen. Genet. 240, 132-139. 

23. Elvin, CM., Vuocolo, T, Smith, W.J., Eisemann, C.H. & Riddles, RW. 

(1994) An estimate of the number of serine protease genes 
expressed in sheep blowfly larvae (Lucilia cuprina). Insect Mol. 
Biol. 3, 105-115. 

24. Hooper, J.D., Nicol, D.L., Dickinson, J.L., Eyre, H.J., Scarman, A.L., 

Normyle, J.F, Stuttgen, M.A., Douglas, M., Loveland, K.A.L., 
Sutherland, G.R. & Antalis. T.M. (1999) Testisin, a new human serine 
proteinase expressed by premeiotic testicular germ cells and lost in 
testicular germ cell tumors. Cancer Res. 59, 3199-31205. 

25. Dickinson, J.L., Bates, E.J., Ferrante, A. & Antalis, T.M. (1995) 

Plasminogen activator inhibitor type 2 inhibits tumor necrosis factor 
alpha induced apoptosis. Evidence for an alternate biological 
function. J. Biol. Chem. 270, 27894-27904. 

26. Antalis, T.M. & Dickinson, J.L. (1992) Control of plasminogen 

activator inhibitor type 2 gene expression in the differentiation of 
monocytic cells. Eur. J. Biochem. 205, 203-209. 

27. Bleyl, S., Nelson, I., Odelbury, S.J., Ruttonberg, H.D.. Olterud, B., 

Lcppert, M. & Ward, K. (1995) A gene for familial total anomalous 
pulmonary venous return maps to chromosome 4pl3-ql2. Am. J, 
Hum, Genetics 56, 408-415. 

28. Cadigan, K.M. & Nusse, R. (1997) Wnt signaling: a common theme in 

animal development. Genes Deu 11, 3286-3305. 

29. Bujo, H., Yamamoto, T., Hayashi, K., Hermann, M., Nimpf, J. & 

Schneider, W.J. (1995) Mutant oocytic low density lipoprotein 
receptor gene family member causes atherosclerosis and female 
sterility Proc. Natl Acad. Sci. USA 92, 9905-9909. 

30. Kounnas, M.Z., Church, F.C., Argraves, W.S. & Strickland, D.K. 

(1996) Cellular internalization and degradation of antithrombin III- 
thrombin, heparin cefaclor Il-thrombin, and a 1 -antitrypsin-trypsin 
complexes is mediated by the low density lipoprotein receptor-related 
protein. J. Biol. Chem. 271, 6523-6529. 

31. Resnick, D., Chatterton, J.E., Schwartz, K., Slayter, H. & Krieger, M. 

(1996) Structures of class A macrophage scavenger receptors. 
Electron microscopic study of flexible, multidomain, fibrous proteins 
and determination of the disulfide bond pattern of the scavenger 
receptor cysteine-rich domain. J. Biol. Chem. 271, 26924-26930. 

32. Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D.W. & Sadler, J.E. (1994) 

Enterokinase, the initiator of intestinal digestion, is a mosaic protease 
composed of a distinctive assortment of domains. Proc. Natl Acad. 
Sci. USA 91. 7588-7592. 

33. Torres-Rosado, A., O'Shea, K.S., Tsuji, A., Chou, S.H. & Kurachi, K. 

(1993) Hepsin, a putative cell-surface serine protease, is required for 
mammalian cell growth. Proc. Natl Acad. Sci, USA 90, 7181-7185. 





Exhibit 15 



Minireview 



Thb Journal of Biouxiical CHBMtsrrHV 
Vol. 276. No. 2. IssuB af Januaiy 12, pp. S67^G0. 2001 
O 2001 by The Amerioui Society &r Btocheinistry and Molecular Biobgy, lac. 

Printed in 



Type II Transmembrane Serine 
Proteases 

INSIGHTS INTO AN EMERGING CLASS OF CELL 
SURFACE PROTEOLYTIC ENZYMES* 

Published, JBC Papers in Press, November 1, 2000, 

DOI 10.1074/jbc.R000020200 

John D. Hoopertt Judith A Clements^:, 
James P. Quigley§, and Toni M. AntalisHD 

From the tCentre for Molecular Biotechnology, 
Queensland University of Technology, Gardens Point, 
Brisbane 4000, Australia, %The Scripps Research 
Institute, La Jolla, California 92037, and the MCellular 
Oncology Laboratory, University of Queensland and the 
Queensland Institute of Medical Research, 
Brisbane 4029, Australia 

Cell surface proteolysis has emerged as an important mecha- 
nism for the generation of biologically active proteins that mediate 
a diverse range of cellular functions. The proteolytic activities of 
membrane- anchored proteins, such as ADAMs^ (1) and MT-MMPs 
(2), are thought to play central roles in cell surface-activating 
events. In contrast, most of the members of the serine protease 
family, one of the oldest characterized and largest raultigene pro- 
teolytic families, are either secreted enzymes or sequestered in 
cytoplasmic storage organelles awaiting signal-regulated release. 
These serine proteases have well characterized roles in diverse 
cellular activities, including blood coagulation, wound healing, di- 
gestion, and immune responses, as well as tumor invasion and 
metastasis. However, during the last few years there has been an 
explosion in the identification of transmembrane proteins contain- 
ing C- terminal extracellular serine protease domains. These en- 
zymes are ideally positioned to interact with other proteins on the 
cell surface as well as soluble proteina, matrix components, and 
proteins on adjacent cells. In addition, these membrane-spanning 
proteases have cytoplasmic N-terminal domains, suggesting possi- 
ble functions in intracellular signal transduction. This review de- 
lineates for the first time this emerging class of cell surface pro- 
teolytic enzymes, the type II transmembrane serine proteases 
(TTSPs), to highlight their structural features, expression profiles, 
and possible roles in mediating cell surface proteolytic events. 

Structural Features ofTTSPa 

In mammals the TTSPs currently consist of 17 members (Table 
I), of which seven are found in man. Enteropeptidase (also known 
as enterokinase) (3), because nf its essential role in the processing 
of digestive proteases, was the first member of this group to be 
discovered nearly a century ago. The other more recently identified 
members include hepsin (4), human airway trypsin-like protease 
(HAT) (5), corin (6), MT-SPl (7) (also known as matriptase (8)), 



This minireview will be reprinted in the 2001 Minireviw Compendiimi, 
which will bo available in December, 2001. This work was supported by the 
National Health and Medif^ Research Council of Australia, the Queensland 
Cancer Fund, and the National Institutes of Health. 

U To whom correspondence should be addressed. Tel.: 617-3362-0312; Fax: 
617-3362-0107; E-mail: toniAtaqimr. edu.au. 

* The abbreviations used are: ADAM, a dtsintegrin-like And juetallopro- 
teinase; ANP. atrial natriuretic peptide; CUB, (Jla/Clr. jjrchin embryonic 
growth factor and bone morphoeenetic protein 1; ECM, extracellular matrix; 
HAT, human airway trypsm-Iike protease; LDL, low density lipoprotein; 
MAM, xneprin, A5 antigen, and receptor protein phosphatase a; MT-MMP, 
membrane-type matrix metalloproteinase; PAI-1, plasminogen activator in- 
hibitor-!; PAR, protoaso-activatod receptor; SEA sea urchin jperm protein- 
Snterokinase-Agrin; SR, Group A scavenger receptor; st-sb, stubbie-stub- 
bloid; TAPVH, total anomalous pulmonarv venous return; TTSP, type II 
transmembrane serine protease; uPA urokinase-typo plasminogen activa- 
tor; uPAR, uPA receptor. 



TMPRSS2 (9), and most recently TMPRSS4^ (10). The only non- 
mammalian TTSP identified to date is the Drosophila protease 
stubble-stubbloid (st-sb) (11). Mammalian orthologues have been 
reported for enteropeptidase (mouse (12), rat (13), cow (14), and pig 
(15)), hepsin (mouse (16) and rat (17)), corin (mouse, also known as 
LRP4 (18)), MT-SPl (mouse, also known as epithin (19)), and 
TMPRSS2 (mouse, also known as epitheliasin (20)) (Table I). The 
TTSPs share a number of common stnxctural features including (i) 
a proteolytic domain, (ii) a transmembrane domain, (iii) a short 
cytoplasmic domain, and (iv) a variable length stem region contain- 
ing modular structural domains, which links the transmembrane 
and catalytic domains (Fig. 1). It is this unique combination of 
domains that suggests novel roles for the TTSPs at the cell surface. 

Proteolytic Domains — As is the case for the wider family of 
enzymes of the chymotrypsin (Si) fold,^ the proteolytic domains of 
the TTSPs share a high degree of amino acid sequence identity. In 
particular, the his ti dine, aspartate, and serine residues necessary 
for catalytic activity are present in highly conserved motifs. TTSPs 
are synthesized as single chain zymogens and are likely activated 
by cleavage following an arginine or lysine present in a highly 
conserved activation motif. Based on the predicted presence of a 
conserved disulfide bond linking the pro- and catalytic domains 
(Fig. 1), the TTSPs are likely to remain membrane-bound following 
activation. However, the isolation of soluble forms of enteropepti- 
dase (21, 22), HAT (23), and MT-SPl (24) suggests that the extra- 
cellular domains of at least some of the TTSPs may also be shed 
from the cell surface. Other cysteine residues conserved among the 
TTSPs include six cysteines predicted to form three intraprotease 
domain disulfide bonds. Enteropeptidase and hepsin each have one 
and corin has two additional predicted disulfide linkages within tlie 
catalytic domain. The presence of an aspartate six residues before 
the catalytic serine, which in the activated TTSP would be posi- 
tioned at the bottom of the Si substrate binding pocket, is indica- 
tive that all of the TTSPs have preference for substrates containing 
an arginine or lysine in the PI amino acid position (SI and PI 
designations are described (25)). The cleavage specificities and 
candidate physiological substrates for some of the TTSPs have been 
elucidated. The predicted cleavage specificity following basic amino 
acids indicates that the TTSPs are likely to have a degree of 
autocatalytic activity. Indeed truncated mouse hepsin lacking cy- 
toplasmic and transmembrane domains (16) and the human MT- 
SPl proteolytic domain (7) are capable of autoactivation. In con- 
trast, bovine enteropeptidase has extremely low autocatalytic 
activity (26), Interestingly, the proteolytic domain of bovine en- 
teropeptidase has an additional role in the targeting of enteropep- 
tidase to the apical membrane of enterocytes (27). 

Transmembrane Domains — Each of the TTSPs contains a hydro- 
phobic domain near the N terminus. This domain is predicted to 
span the plasma membrane in such a way that the proteolytic 
domain lies extracellularly, presumably to localize TTSP proteo- 
lytic activity in close proximity to tariget substrates and/or to per- 
mit regulated release of the protein from the cell surface. Cell 
surface localization has been experimentfilly demonstrated for en- 
teropeptidase, hepsin (28, 29), MT-SPl (30, 31), TMPRSS2 (20), 
and TMPRSS3 (10). 

Cytoplasmic Domains — The cytoplasmic domains of the TTSPs 
(Fig. 1) range in length from 12 amino acids for HAT to 112 amino 
acids for murine corin. Whether these domains have the potential 
to support interactions with cytoskeletal components and signaling 
molecules is not yet known. However, a number of the TTSPs 
including corin, MT-SPl, st^sb, and TMPRSS2 contain consensus 



"Originally designated TMPRSS3 (10). Tho Human Genome Nomanda- 
turo C omm ittee- approved symbol TMPRSS3 has been allof:ated to a pre- 
dicted TTSP-encooing geno located on chromosome 21q22.3 (66). The amino 
acid sequence of the TMPRSS3 protein has not been reported. 

^ Information on the classification and nomenclature of the SI family of 
peptidases can be found in the Intemet-accossible MEROPS data base. 



This paper is available on line at http://www.jbc.org 



857 



858 



Minireview: Type II Transmembrane Serine Proteases 



Table I 

Summary of type II transmembrane serine proteases 
The abbreviations used are: b, brain; bl, bladder; bp, Drosophila S6-h pupae; c, colon; de, Drosophila 12-l8-h embryo; dp, DrosophUa earty 
prepupae; e, esophagus; h, heart; int, intestine; k, kidney; 1, lung; le, leukocytes; 11, Uven P. pancreas; pi, placenta; pr, prostate; psi, proximal small 
intestine (si); a. spleen; st, stomach; t, testes; th, thymus; tr, trachea. 







Other 


% Identity to 


Accession 


MW 


Gene 




Expression Pattern 


mRNA 




Name 


Orgzmism 


Name 


Human Onhologue 


Number 


(kDa) 


Location 






♦ 


Size(kb) 


Reference 


Conn 


Human 


. 


too 


AF 133645 


-150* = 


4p12-13 


h 


- 


- 


5 


(6.56) 




Mouse 


LRP4 


82 


A3013874 


123 




h 


I 




5 


(18) 


Enter opeplidase 


Huntan 


Enterokina»e 


100 


U098(K) 


156* 


21q2l 


psi 






4.4 


(3) 




Bovire 




83 


U09859* 


ISO" 












(14) 




Mouse 




75 


U73378 


118.7 












(12) 




Rat 




73 


1589367 


117.7 




psi 




b. c. %\ 


4.4 


(13) 




Porcine 




as 


030799 


200* 


* 










(15) 


MT-SP1 


Human 


Matrlpiase 


100 


AFt330ee/AF118224 


67^ 


11q2S 


c. 5i. St pr 




k. %, le 


3.3 


(7. 6.31) 




Mouse 


Epithin 


81 


AF042822 


94.4 


9' 


int. k 




). s. Ih 


3 


(19) 


HAT 


Human 




100 


AB002134 


48* 




\r 




* 


0.9. 1.9.3.0 


(5) 


Hepsin 


Human 




100 


M 18930 


51" 


19q13.1 


li 




k, 1, p. pr 


1.8 


(4) 




Mouse 




88 


AF030065' 


44.7 








• 


1.8. 1.9 


(18) 




Rat 




88 


X70300 


44.9 


* 




• 






(t7) 


Stubble-Stubtoid 


DrosoptiUd 




* 


L114S1 


as 




bp, de. dp 






3.8 


(11) 


TMPRSS2 


Human 




100 


U7S329 


53.8 


2iq22.3 






c, k. 1, li, 0 


3.8 


(9) 




Mouse 


Epithellasin 


77 


AF113596 




16C2 


k 


1 


It 


1.5.2.8 


(20) 


TMPRSS4 


Human 




100 


AF 179224 


68"- = 


1lq23^ 






bl. c. e. k, $i, 9l 


2.3 


(10) 



* Splice variants have been identified. Experimentally derived molecular weight. " VB/Hisg-tagged protein. Putative assignment based on our 
unpublished observation that LRP4 sequences have greater than 96% identity with mouse chromosome 5 BAG RP23-294A15 sequences deposited 
in the GenBank^ htgs database ((SenBank™ accession no. AC036146). " Closest linkage to the Flil gene. 



phosphorylation sites for either or both of protein kinase C and 
casein kinase 11. In addition, based on the cellular sorting of other 
integral membrane proteins (32) it is likely that the cytoplafimic 
and transmembrane domains also contribute to the targeting of the 
TTSPs to a particular cell surface in polarized cells. 

Stem Regions — The stem regions of the TTSPs contain as many as 
11 structural domains that may serve as regulatory and/or binding 
domains (Fig. 1). These include low density lipoprotein (LDL) recep- 
tor class A domains. Group A scavenger receptor (SR) domains » 
frizzled domains, CWClr, iirchin embryonic growth factor and ^ne 
morphogenic protein 1 (CUB) domains, sea urchin sperm protein, 
gnterokinase, agrin (SEA) domains, a meprin, A5 antigen, and recep- 
tor protein phosphatase (MAM) domain, and a disulfide knotted 
domain. Hepsin is the only TTSP that does not possess an identified 
structural domain within its stem region. Although functional roles 
for individual stem region domains have not been demonstrated, the 
stem region of bovine enteropeptidase has been shown to be required 
for efiicient cleavage of its physiological substrate trypsinogen (26). 
In addition, the N terminus of the stem r^on of Uiis protein is 
required for delivery of enteropeptidase to the apical surface of po- 
larized Madin-Darby canine kidney cells (27). 

The most common stem re^on structural domain is the LDL 
receptor class A domain: corin contains eight, MT-SPl four, en- 
teropeptidase two, and TMPRSS2 and TMPRSS4 one each (Fig. 1). 
Although the function of these domains in the TTSPs has not been 
demonstrated, in other proteins they bind Ca''^'^ ions and mediate the 
internalization of macromolecules including serine protease-inhibitor 
complexes and lipoproteins (33-35). In addition, although LDL re- 
ceptor domains also function in the uptake of LDLs, increased LDL 
uptake could not be demonstrated following expression of murine 
corin in COS cells (18). 

Six other structural domains that are thought to be involved in 
protein>protein interactions or protein-ligand interactions are 
found in various TTSPs. SR domains (36) are present in corin, 
enteropeptidase, TMPRSS2, and TMPRSS3; frizzled domains (37) 
are present in corin; CUB domains (38) are present in enteropep- 
tidase and MT-SPl; SEA domains (39) are present in HAT and 
enteropeptidase; a MAM domain (40) is present in enteropeptidase; 
and a disulfide knotted domain (41) is present in st-sb (Fig. I). In 
addition to these structural domains, human and mouse MT-SPl s 
possess a conserved RGD motif (42) present in the first CUB 
domain. Interestingly, truncated human MT-SPl lacking cytoplas- 
mic and transmembreuie domains remains bound to the cell surface 
of COS cells (31). Binding may be mediated via an interaction 
between the MT-SPl RGD motif and an integrin protein or another 



cell surface proteirL Alternatively, the mode of attachment could be 
via a direct link such as a hydrocarbon chain. 

Tissue Expression of TTSPs 

Although a few of the TTSPs are expressed across several tissue 
and cell types, in general these enzymes demonstrate relatively 
restricted expression patterns, indicating that they may have tis- 
sue-specific fiinctiona (Table 1). Enteropeptidase shows a very nar- 
row expression pattern, being restricted in normal tissues to en- 
terocytes of the proximal small intestine (12). Corin expression is 
also quite specific, with corin mRNA highly expressed in human 
heart (6) and corin protein expression localized to cardiac myocytes 
(43). HAT is predominantly expressed in trachea (5, 23). Human 
TMPRSS2 expression is predominantly associated with prostate (9, 
44).'* Hepsin, originally identified from liver, is highly expressed in 
fetal liver and kidney (45). Hepsin mRNA has been reported to be 
overexpressed by ovarian tumors (46), and protein expression has 
been localized to tumor cell membranes in renal cell carcinoma 
(29). TMPRSS4 has only recently been characterized and was iden- 
tified as a consequence of its strong up-regulation in pancreatic 
tumors (10). While TMPRSS4 was not detected in normal pancreas, 
very low level TMPRSS4 mRNA expression was detected in tissues 
of the gastrointestinal tract and in some tissues of the urogenital 
tract (10). MT-SPl was originally identified fi^ra a human breast 
cancer line (30) but shows the broadest pattern of expression of the 
TTSPs being detected in a wide range of both human (7) and 
murine tissues (19). 

Biochemical Data and Pathophysiological Roles 
The majority of the TTSPs have been identified relatively re- 
cently and consequently have not been extensively characterized. 
Enteropeptidase is somewhat of an exception. Although the enzy- 
matic activity ascribed to enteropeptidase was first identified al- 
most a century ago (47) it has been only recently that the complete 
amino acid sequence was described (3). Enteropeptidase fxmctions 
near the apex of the digestive enzymatic cascade activating the 
digestive protease trypsinogen to trypsin^ which stibsequently ac- 
tivates other enzymes includiivg chyraotrypsinogen, proelastase, 
proU pases, and procarboxypeptidases. Enteropeptidase possesses 
extremely low autocatalytic activity, and it has been proposed that 
the serine protease duodenase, secreted by duodenal epitheliocytes, 
may be its physiological activator (48). Active enteropeptidase con- 



■* The Northern blot data reported (9) are incorrectly labeled due to inver- 
sion of the membranes (Stylianos Antonarakis. personal oommunicatioii). 



Mijiirevieuo: Type II Transmembrane Serine Proteases 



859 




HAT {418 M) 




» a ^ — 155 



0 S> »CQ>H 



s4 J* 9 S_ 



HtfMlh {417 *k) !4 




kAttiSofnain 



Fig. 1. Type II transmembrane serine protease domain struc- 
ture. Structures, listed by Length, are of the seven human TTSPs and 
the Drosophila TTSP st-sb. The amino acid (aa) sequence of each 
protein was scanned using the ProfileScan algorithm to confirm the 
presence of each domain. Numbers delineate the location of each 
domain. 



sists of heavy and light chains that are extensively glycosylated 
(27, 49). It has recently been reported that physiological concen- 
trations of pancreatic trypsin activate protease- activated receptor 
(PAR) 2 at the apical membrane of entenocytes (50). PAR2 is a 
member of the PAR family of signal-transducing, G protein-cou- 
pled, plasma membrane-spanning receptors, which are activated 
by the proteolytic action of select serine proteases (51, 52). These 
data and the observation that an exosite in the heavy chain of 
enteropeptidase is required for efHcient recognition of trypsinogen 
(26) suggest that enteropeptidase may play a role in facilitating 
trypsin- mediated PARS activation on enterocytes. Thus enteropep- 
tidase may localize trypsinogen/trypsin at the membrane of entero- 
cytes, initiating a limited proteolytic cascade at the cell surface in 
close proximity to the trypsin cleavage target PAR2, thereby facil- 
itating receptor activation and signal transduction. 

Hepsin is a glycoprotein originally cloned frtjra human liver and 
hepatoma cell lines and, more recently, implicated in mammalian 
cell growth and morphology (53), tumor progression (28), and de- 
velopmental processes, such as blastocyst hatching (16). The im- 
portance of hepsin in uiuo, however, remains unclear as homozy- 
gous hepsin null mice are pheno typically normal (54). An as yet 
unexplained pheno type of the hepsin — /— mice is a 2-fold higher 
serum concentration of bone-derived alkaline phosphatase com- 
pared with wild type mice (55). 

The human airway TTSP, HAT, was originally purified as a 
soluble protein from the sputum of patients with chronic airway 
diseases. Full-length HAT is synthesized, translocated to the cell 
surface where it is processed to a soluble form, and then released 



from tracheal seruus glands as part of the host immune defense 
system (5). 

Significantly, the human heart TTSP, corin, is an in vitro acti- 
vator of pro-atrial natriuretic peptide (ANP), a cardiac hormone 
essentia] for the regulation of blood pressure (56), suggesting that 
corin is the long sought pro-ANP convertase, 'Hiis proteolytic cleav- 
age is critical for the regulation of ANP activity (57); thus, corin 
may well prove to be an important factor in the regulation of major 
cardiovascular diseases. Dysfunctional corin was proposed to be a 
candidate for the rare congenital heart disease, total anomalous 
pulmonary venous return (TAPVR), as the corin gene colocalizes to 
the TAPVR loc\is on human chromosome 4pl2— 13 (6). In addition 
to heart, murine corin is expressed by chondrocytes in a differen- 
tiation stage-specific manner during mouse development, suggest- 
ing that this protease may play a role during chondrocyte differ- 
entiation/bone formation (6). However, while human and murine 
corin share high homology, common structural features, expression 
profiles, and syntenic chromosomal locations, these proteases are 
variant in the lengths of their cytoplasmic domains (45 residues in 
human and 112 in mouse) and show no conservation in amino acid 
sequence in this domain. This may indicate that murine and hu- 
man corin have different but perhaps overlapping species-specific 
roles, or alternatively the C3^plaamic domain is not essential for 
corin functions. 

In other significant recent experiments it has been shown that 
MT-SPl may be involved in initiating signaling and proteolytic 
cascades via the activation of the cell surface-associated proteins 
PAR2 and pro-uPA(31), Interestingly, MT-SPl from breast cancer 
cells is detected largely as an uncomplexed protein, whereas in 
milk it is present mainly as a complex with the Kunitz-type serine 
protease inhibitor hepatocyte growth factor inhibitor-l (24). It will 
be important to identify the inhibitor binding domains of MT-SPl 
and the function of the protease*inhibitor complex. 

TMPRSS2 and TMPRSS4 have been identified through associa- 
tion with cancer. TMPRSS2 is thought to play a role in epithelial 
cell biology, and its association with prostate carcinogenesis has led 
to the proposal that it may be a diagnostic or therapeutic taiiget for 
prostate cancer (44). TMPRSS2 has been proposed to be part of an 
enzymatic cascade involving the serine proteases prostate-specific 
antigen and human kallikrein K2 in a manner analogous to the 
fibrinolytic and blood coagulation cascades (44). TMPRSS4 is over- 
expressed in pancreatic cancers; however, its functional signifi- 
cance remains unclear (10). 

The Drosophila serine protease st-sb is one of a number of 
proteases involved in fly morphogenesis (11) and has a proteolytic 
function in detaching imaginal disks from extracellular matrices. In 
addition, the phenotype of s1>8b mutants has led to speculation that 
the encoded protein is involved in outside to inside signal transduc- 
tion via its cytoplasmic domain, thus resulting in cytoskeletal reor- 
ganization and changes in cell shape during morphogenesis (11). 

Analogous Membrarie'ixssociated Proteolytic Systems 
In contrast to the traditional protein catabolic functions of many 
of the secreted members of the serine protease family and based on 
the presence of multiple structural domains in the TTSPs, it is 
tantalizing to speculate that the TTSPs function as key regulators 
of signaling events at the plasma membrane. Precedents for such 
functions come trom other more well characterized membrane- 
assodated proteolytic systems such aa the ADAMs (1), the MT- 
MMPa (2), and the uPA-uPA receptor system (58). 

The ADAMs have recognized and proposed roles in the proteol- 
ysis of extracellular matrix (ECM) components and cell surface 
proteins, in mediating cell adhesion via integrin binding, in cell 
fusion and signaling via interactions of their cytoplasmic domains, 
and in RGD-mediated interactions with integrins (59—61). The 
TTSPs are similarly positioned at the plasma membrane to release 
ECM components and to proteolytically activate cell surface proteins 
such as PARs, growth factora, and cytokines, and to interact with cell 
surface and soluble ligands. In addition, the presence of the cytoplas- 
mic domains indicates that the TTSPs may be capable of interacting 
with the cytoskeleton and/or with cellular signaling molecules. 

The MT-MMPs function in pericellular cascades to activate other 
MMPs involved in the cleavage of ECM components. The TTSPs may 
well perform similar functions in activating proteolytic cascades on 



860 



Minireview: Type II Transmembrane Serine Proteases 



the plasma menibrane. Indeed, this functian has been demonstrated 
for enteropeptidase in the activation of digestive proteases. Mareover, 
there is increasing evidence for cross-talk between proteolytic sys- 
tems. The uPAniPA receptor system of cell surface-localized proteo- 
lytic activity has a recognized role in the initial stage of MMP acti- 
vation (62)> and other serine proteases are also capable of in vitro 
MMP activation (63, 64). The TTSPs could play a direct role in MMP 
activation or an indirect role in localizing and activating other serine 
proteases more directly associated with MMP activation. The activa- 
tion of uPA by MT-SPl (31) and subsequent downstream MMP 
activation could be an example of such cross- talk. 

Several other parallels may also be drawn from the uPA*uPA 
receptor system. That the TTSPs are direcUy anchored to the 
plasma membrane implies that they have potential to mimic local- 
ization of the uPA-uPAR system to the leading edge of migrating 
tumor cells (65). Further, the interaction of the uPA-uPAR system, 
via a nonproteolytic mechanism, in mediating cell-cell contacts 
through association with integrins may also parallel TTSP proper- 
ties. Indeed the multidomain structure of the TTSPs indicates their 
capacity to interact with multiple partners and suggests the pos- 
sibility that these membrane proteins may form part of a signalo- 
some-like complex, thereby mediating at the cell sxirface multiple 
signaling pathways as is the case for the uPA-uPAR system (58). 

Concluiling Remcwka 

What is known about the TTSPs is that they function or have the 
structural motifs necessary to function as serine proteases. What can 
be speculated upon is that their numerous and varied nonproteolytic 
domains are likely to mediate interactions with proteolytic substrates 
and inhibitors as well as other proteins and ligands. Such interac- 
tions will potentially regulate the proteolytic activity of the catalytic 
domain but perhaps may also have functions quite independent of 
this domain. Furthermore, given the integral plasma membrane 
nature of the TTSPs, it is tempting to speculate that at least some of 
the TTSPs will fimction directly in transducing signals across the 
plasma membrane, as has been sxiggested for the Drosophila TTSP 
strsb (11). There is clearly a need for a greater understanding of the 
biology and physiological functions of this group of unique proteases 
to obtain a better picture of the dynamics occurring on the cell 
surface. Because of the mosaic structure of the TTSPs it will be 
important to understand the role of their individual domains as well 
as the role of each protein in toto. 

Note Added in Proof— Two cDNAs encoding the putative TTSPs Xeap-2 
and XMT-SPl have recently been identified from Xenopus laevis (67). 

REFERENCES 

1. Stone, A. L., Kroeger, M., and Sang, Q. X. (1999) J. Protein Chem. 18, 447-465 

2. Seiki, M. (1999) APAf/S 107, 137-143 

3. Kitamoto. Y., Veile, R A.. Donis-Keller. H., and Sadler, J. E. (1995)Btocftem- 

utry 34, 45612-4568 

4. Leytua, S. R. Loeb, K. R., Hagen, F. S., Karachi. K., and Davie, E. W. (1988) 

BiochemUtry 27, 1067-1074 

5. Yamaoka, K, Maauda, K, Ogawa, H., Takagi, K, Umerooto, N., and Yaauoka, 

S. (1998) J. BioL Chem. 273, 11895-11901 

6. Yan, W., Sheng, N., Seto, M., Moreer, J., and Wu, Q. (1999) J. BioL Chem. 274, 

1492G-14935 

7. Takeuchi, T.. Shuman. M. A., and Craik, C. S. (1999) Proc. Natl. Acad. Sci. 

U. S. A. 98, 11054-11061 

8. Lin, C.-Y., Anders, J., Johnson, M., Sang, Q. A, and Dickson, R. B. (1999) 

J. BioL Chem. 274, 18231-18236 

9. Paoloni-Giacobino, A., Chen, H., Peitsch, M. C, Rossier, C, and Antonarakis, 

S. E. (1997) Genomics 44, 309-320 
10 Wallrapp. C, Hahnel. S.. Muller-PiUaBch, F., Bur]ghardt, B., Iwaraura, T., 
Ruthenburger. M., Lerch, M. M., Adler, G.. and Gross, T. (2000) Cancer 
Rea. 80. 2602-2606 

11. AppeJ, L. F., Prout, M.. Abu^Shuroaye. R, Hammonds, A., Garbe, J. C, 

Fristrom. D., and Fristrom, J. (1993) Proc. Natl. Acad. Sci. U.S.A. 90, 
4937-4941 

12. Yuan, X.. Zheng, X., Lu, D., Rubin, D. C, Pung, C. Y., and Sadler, J. E. (1998) 

Am. J. Physiol. 374, G342-G349 

13. Yahagi, N., Ichinoso, M., Matsushima, M., Mataubara, Y., Miki. K., Kurokawa, 

K., F\ikamachi, H., Tashiro, K., Shiokawa, K., Kageyama, T., Takahaahi, T., 
Inouo, H., and Takahaahi, K. {199G) Biochem. Biophya. Res. CommuTu 219, 
^ 806-812 

14. Kitamoto, Y., Yuan, X., Wu, Q., McCourt. D. W., and Sadler, J. E, (199 A) Proc 

Natl. Acad. Sci. U. S. A. 91. 7588-7592 

15. Matsuahima, M., Ichinoso, M., Yahagi, N., Kakei, N., Tsukada, Miki, K., 

Kurokawa, K, Tashiro, Shiokawa, K., Shinomiya, K., Unwryama, R, 
Inouo, H., Takahashi, T., and Takahashi, H. (1994) J. BioL Chem, 289, 
19976—19982 

16. Vu, T. K. H., Uu. R. W., Haaksma, C. J., Tomasek, J. J-, and Howard, E. W. 



(1997) ./. BioL Chem. 272, 31315-31320 

17. Farley, D.. Raymond, F., and Nick, R (1993) Biochim. Biophya. Acta 1173, 

360-352 

18. Toroita, Y., Kim, D. R, Magoori, K., Piyino, T.. and Yamamoto, T. T. (1998) 

J. Biochem. (Tokyo) 124, 784-789 

19. Kim, M. G., Chen, C, Lyu, M. S., Cho, E. G., Park, D., Koiak, C, and Schwarts, 

R. H. (1999) Immunogeneties 48, 420-428 

20. Jacquinet, E., Rao. N. V., Rao, G. V., and Hoidal. J. R. (2000) FEES Lett. 468. 

93-100 

21. Louvard, D., Maroux, S., fiaratti, J., and Desnuellc, P. (1973) Biochim. 

Biophya. Acta 309, 127-137 

22. Fonseca, P., and Light, A. (1983) J. BioL Chem. 258, 14516-14520 

23. Yasuoka, S., Ohnishi. T., Kawano, S., Tsuchihashi, S., Ogawara, M., Masuda. 

K., Yamaoka, K., Takahashi, M., and Sano, T. (1997) Am. J, Respir. Cell 
MoL Bioi. 16, 300-308 

24. Un, C.-Y., Anders, J., Johnson. M., and EKi^son. R. B. (1999) J. Biol. Chem. 

274, 18237-18242 

25. Schechter, I., and Berger, A. (1967) Biochem. Biophya. Rea. Commun. 27. 

157-162 

26. Lu. D.. Yuan. X.. Zheng, X., and Sadler. J- E. (1997) J. Biol. Chem, 272, 

31293-31300 

27. Zheng, X., Lu. D.. and Sadler. J. E. (1999) J. BioL Chem. 274, 1596-1606 

28. Kazama, Y., Hamamoto, T., Foster, D. C, and Kisiel. W. (1995) J. Biol. Chem. 

270. 66-72 

29. Zacharski, L. R, Omstein, D. L.. Memoli, V. A., Rousseau, S. M, and Kisiel, W. 

(1998) Thmmb. Haemostasia 79, 876-877 

30. Lin, C.-Y., Wang, J. K, Torri. J., Dou, L., Sang, Q. A., and Dickson, R. B. (1997) 

J. BioL Chem. 272, 9147-9152 

31. Takeuchi, T., Harris, J., Huttng, W., Van. K. W., Coughlin. S. R., and Craik, 

C. S. (2000) J. BioL Chem. 275, 26333-26342 

32. Keller, P., and Simons. K. (1997) J. Cell Sci. 110, 3001-3009 

33. Brown, M. S., Herz, J., and Goldstein, J. L. (1997) Nature 388, 629-630 

34. Nykjccr, A., Conese, M., ChristenBen, E. I., Olson, D., Cremona, O.. Gheman, 

J., and Blasi, F. (1997) EMBO J. 16. 2610-2620 
36. Kounnas, M. Z., Church, F. C, Argraves, W. S., and Strickland, D. K. (1996) J. 
Biol. Chem. 271, 6523-6529 

36. Rcsnick, D., Chatterton, J. E., Schwartz, K., Slayter, H., and Krieger, M. 

(1996) J. BioL Chem. 271, 26924-26930 

37. Cadigan, K. M., and Nusse, R. (1997) Genes Dev. 11, 3286-3305 

38. Bork, P., and Beckmann, G. (1993) J. Mol. Biol 231, 639-545 

39. Bork, P., and Patthy, L. ( 1995) /Votein Sci. 4. 1421-1425 

40. Beckmann, G., and Bork, P. (1993) Trends Biochem. ScL IB, 40-41 

41. Muta, T., Hashimoto, R. Miyata, T., Niahimura, H., Toh, Y., and Iwanaga, S. 

(1990).f. Biol. Chem. 265, 22426-22433 

42. Hynea. R. O. (1992) Cell 69. 11-25 

43. Hooper, J. D., Scarman, A. L., Clarke, B. E., Normyle, J. P., and Antahs, T. M. 

(2000) Eur. J. Biochem. 267, 6931-6937 

44. Lin, B., Ferguson. C, White, J. T., Wang, S., VesseUa, R, True, L. D., Hood, 

and NeUon, P. S. (1999) Cancer Rea. 69. 41S0-4184 

45. Tsuii. A, Torres-Rosado, A., Arai, T., U Beau, M. M., Lemons, R S., Chou, 

S H and Kurachi, K. (1991) J. BioL Chem. 266. 16948-16953 

46. Tanimoto, H., Yan. Y.. Clarke, J., Korourian, S., Shigemasa, K, Parmley, 

T. H., Parham, G. P.. and O'Brien. T. J. (1997) Concer Res. 57. 2884-2887 

47. Pavlov, I. P. (1902) The Work of the Digestive Glands, 1st Ed., pp. 148-163, 

translated by W. H. Thompson, Charles Griffin & Co., London 

48. Zamolodchikova, T. S.. Sokolova, E. A., Lu, D.. and Sadler, J. E. (2000) FEBS 

Utt. 468, 295-299 

49. Lu, D., and Sadler, J. E. (1998) in Handbook of Proteoiytic Enzymes (Barrett, 

A. J., RawlingB, N. D., and Woessner, J. P., eds) pp. 50-54, Academic Press 

Ltd., London « r» o 

50. Kong, W.. McConalogue. K., Khitin, L. M.. HoUanberg, M. D., PaO^n, D. tr., 

Bohm, S. K., and Bunnett, N. W. (1997) Proc. NatL Acad. Sci. V. S. A. B4. 
8884-8889 

51. Deiy, 0., and Bunnett. N. W. (1999) Biochem. Soc. Trans. 27, 246-254 

52. Hollenberg. M. D. (1999) Trends Pharmacol. Sci. 20, 271-273 

53. Torrea-Rosado. A.. O^ea, K S., Tsi^i, A., Chou, S. H., and Kurachi, K. (1993) 

Proc. NaU, Acad. Sci. V. S. A. 80, 7181-7185 

54. Wu, Q., Yu, D., Post, J., Halks, M. M., Sadler, J. E., and Morser, J. (1998) 

J. Clin. Invest. 101, 321-326 
65. Kawamura, S.. Kurachi, S., Deyashiki. Y., and Kurachi, K. (1999) Bur. J. Bio- 
chem. 262,756-764 . ^ . „ . „ o * 

56. Yan, W., Wu. F., Morser, J., and Wu, Q. (2000) Proc. NatL Acad. Sci. V. S. A. 

67, 8625- 8529 

57. Lang, R E., Tholken. H., Ganten, 0., LuR. F. C, Rusfcoaho, H., and Unger, T. 

(1985) Nature 314, 264-266 

58. Knahelnick, Y., Ehart, M., Stockinger, H.. and Binder, B. R (1999) Thromb. 

Haemostasia 82, 305-311 

59. Wolfoberg, T. G., Primakoff, P., Mylea, D. G., and White, J. M. (1995) J. Cell 

BioL 131, 275-278 

60. Wolfsberg, T. G., and White, J. M. (1996) Dev. Biol. 180, 389- 401 

61. Schlondorff, J., and Blobel, C. P. (1999) J: CeU Sci. 112. 3603-3617 

62. Carmoliet. P.. Moons, I*. Mjnen, R, Baes, l-emaitrc. V., Tipping, P., Drew, 

A., Eockhout, Y., Shapiro, S., Lupu, F., and Collen, D. (1997) Nat. OeneL 17, 
439—444 

63. Nagase, H., Enghild, J. J., Suzuki, K. and Salvesen. G. (1990) flioc/iemi*(/y 

29, 5783-5789 

04. Ramos-DoSimone. N., Hahn-Dantona, E., Sipley, J., Nagase, H., French, D. L., 
and Quigley, J. P. (1999) Jl BioL Chem. 274, 13066-13076 

65. Blasi, F. (1999) Thromb. Haemostasia 82, 298-304 

66. Hattori, M. et al. (2000) Nature 405, 31 1-319 

67. Yamada, R. Tnkobntake, T., and Takeshima, K. (2000) Gene (Amst.) 252, 

209-216 



Exhibit 16 




23W 



FEBS Letters 468 (2000) 9>-100 




loning, genomic organization, chromosomal assignment and expression 

of a novel mosaic serine proteinase: epitheliasin 

Eric Jacquinet, Narayanam V. Rao, Gopna V. Rao, John R. Hoidal* 

^,^l}(partment of Interna! Medicine. Division of Respiratory, Critical Care and Occupational Medicine, Pulmonary Division. Wintrobe Building, 
SvT 743 A, SON. Medical Drive. University of Utah Health Sciences Center and VA Medical Center, Salt Lake City. VT 84132. USA 

Received 20 January 2000 

Edited by Horst Feldmann 



'^{^ract We report the isolation of a cDNA encoding a novel 
90iirfaie serine proteinase, epitheliasin. The cDNA spans ]753 bp 
^'thf) encodes a mosaic protein with a calculated molecular mass or 
£j3529 Da. Its domains- include a cytoplasmic tail, a t^-pe II 
^I^Qsmembrane domain, a low-density lipoprotein receptor class 
^•j^ domain, a cysteine rich sca%*enger receptor-like domain and a 
Sienne proteinase domain. The proteinase portion domain shows 
f4$S^yo identity with mouse neurotrypsin, acrosin, hepsin and 
Scbteropeptidase. The gene, located in the telomeric region in the 
^iong arm of mouse chromosome 16, consists of 14 exons and 13 
^bitrons and spans approximately 18 kb. Epitheliasin is expressed 
^.primarily in the apical surfaces of renal tubular and airway 
IjrpUhelial cells. 

i'-(E> 2000 Federation of European Biochemical Societies. 
f Key words: Serine proteinase: Mosaic protein; Epitheliasin 



J.- Introduction 



Proteinases are implicated in a wide spectrum of physio- 
ylogic and pathophysiological processes in the kidney. Renin, 
/a proteinase synthesized in renal cortical cells plays a major 
^role in the regulation of blood pressure and electrolyte bal- 
iance by converting angiotensinogen lo angiotensin 1. Further- 
2more, the renal kallikrein-kinin system activated under con- 
^"dilions of mineralocorticoid excess represents a compensatory 
yfesponse against the development of hypertension and renal 
.yinjury induced by salt excess. Proteolytic enzymes also have 
♦been ascribed important roles in both Icukocyie-depcndcnt 
trand independent models of glomerular diseases (reviewed in 
llM)- Recently. Vallei and colleagues identified a novel serine 
sproteinase from Xenopiis lacvis kidney epithelial cells. CAP 1, 
^involved in activation of the epithelial sodium channel. EnaC 
^J2J. This was the first report of channel activating activity of 
^ endogenous proteinase. 

Srin the present report, wc describe a novel serine proteinase 
j^Sl ^P^'^ss^'l in murine renal epithelial cells with sequence homol- 
'^*^liP8>* to CAPI 



The enzyme, that we term epitheliasin. is a 
|5?odular protein consisting of five sequence motifs, a cytoplas- 



* 




1^*^ tail, a type II transmembrane (TM) domain, a low-density 
>proiein receptor class A (LDLRA)-!ike domain, a cysteine 
scavenger receptor-like (SRCR) domain and a serine pro- 
^ioase domain. The sequence and structural features of epi- 
gP^liasin cDNA and gene, its chromosomal localization and 




^responding author. Fax: <I>-80l-585 3355. 
I: jhoidal@med.uiah.cdu 

Jl*-S793/ 00/ $20.00 C 2000 Federation of European Biochemical Societies. All rights reserved. 
Ij SOO I 4-5793(00)0 I 196-0 



tissue expression are described. Epitheliasin has sequence 
identity to a human cDNA recently cloned by exon trapping 
named TMPRSS2 [3]. However, the tissue distribution of epi- 
theliasin and TMPRSS2 is strikingly different. 

2. Materials and methods 

2. 1. Materials 

Multiple tissue Northern blots. ExpressHyb hybridization solution, 
rapid amplification of cDNA ends (RACE) ready cDNAs from mouse 
kidneys and Marathon cDN.\ kits were from CLONTECH (Palo 
Alto, CA. USA). TA cloning kits were from Invitrogen (Carlsbad, 
CA. USA). LA PCR kits we're from Panvera (Madison. Wl. USA). 
Klenow DNA polymerase. (ot-'-PJdCTP (3000 Ci/mmol) and ly-"Pj- 
dATP (3000 Ci/mmol) were from Amersham Life Science (Arlington 
Heights. IL, USA). BUPHQ Tris-glycine SDS. Tris-glycinc and Im- 
munogen Conjugation kits were from Pierce (Rockford. IL. USA). 
Alkaline phosphatase conjugiued goat anti-rabbit antibody was 
from Zymed (San Francisco. CA. USA). BCIP/NBT tablets were 
from Sigma (St Louis. MO. USA). Citra solution and VIP substrate 
were from Vector Laboratories (Burlingame. CA. USA). Blocking 
reagent. SA-HRP and bioiiny) tyramide were supplied by NEN Life 
Science Products (Boston. M.A. USA). 

2.2. hivntification and cloning of epitlwltasin cD\A- 

A conserved sequence around the serine active site residue 
(GGIDSCQGDSGGPLVC) was used to search the mouse EST data- 
base using TBLASTn. Of the 100 ESTs initially identified, a novel 
EST (ubSSgOl.sl) containino 3S9 nt and its mirror sequence 
(ub5Sg0l.rl} were further analyzed using the non-redundant data- 
bases. BLASTn and BLASTv. Four overlapping sequences were 
found from these searches, one was from a kidney library 
(ucSlcl l.yl). two from a mammary gland library (vfB6g09.rl. 
ve37cl2.rl). and one from a blastocyst library (vI64c03.rl ). 

To obtain the full-length cDNA of interest the RACE strategy was 
employed. Initially. LA PCR was utilized to amplify mouse kidney 
cDNA employing a sense, primer (5'."-^CCATACTCAACTCCTC- 
ATGCTGCT"'''-3') designed based on the novel sequence and an 
anchor primer. API. The initial PCR product was subjected to nested 
PCR using a sense (5'--'"*CTGACACAGCCAGGATGGCATTG''- 
3') and an anti-sense primer ( 5'- ''*"GTGG ATT AGCTGTTCG CC- 
CTCATT''*'*-3'). This nested reaction amplified a 1.5 kb product 
that was ligaied into the pCR^?. I vector and sequenced using an 
ABI automatic sequencer. 

To obtain the 3' end, mouse kidney cDNA was subjected to 3'- 
RACE. The cDNA was amplified using API and a sense primer (5'- 
-^CCATACTGAACTCCTCATGCTGCT-"-3'). The product was 
diluted (1:50) and a nested PCR amplification was performed using 
a second anchor primer. AP2. and a sense primer (5'- 
-'*CTGACACA- GGCAGGATGGCATTG'-3'). The 2 kb PCR 
product obtained was cloned and sequenced as described abo\'e. 

2.3. Genomic cloning and analysis 

To obtain the epitheliasin gene, a mouse genomic bacterial artificial 
chromosome (BAC) librar>' (Genome Systems. St Louis. MO. USA) 
was screened using a 0.7 kb probe extending from 831 to 1477 nt of 
mouse epitheliasin cDNA. A single done (BAC-24) was identified and 
confirmed by sequencing to contain the entire epitheliasin gene. To 



■ 



94 

identify ihc iniron junction borders, DNA from BAC-24 was directly 
sequenced using oligonucleoiide primers defined mitiaUy by the cDNA 
sequences and subsequently by derived sequences. Soulhem analysis 
was used lo determine the siie of the epitheltasin gene. 

2 4. Chronwsomal assignment 

The plasmid clone (BAC-24) obtained from the genomic library was 
used as a probe for chromosomal localization by fluorescence in situ 
hybridization (FISH). The probe was nick translation -labeled with 
bioiin. hybridized to meiaphase chromosomes and delected with 
Cy-3-conjugated strepiavidin. Chromosome spreads were prepared 
by standard procedures and G-banded after trypsin treatment and 
Wright's staining. Hybridization and detection conditions on meia- 
phase chromosomes were performed as previously described (4). Probe 
signals were delected with the Cy3 conjugate viewed using an epifiuor- 
esccnce microscope. The fluorescence image was overlaid on the G- 
bandcd image to localize the gene. 

2.5. Northern blot analysis 

Mouse multi-tissue blols containing 2 of poly(A) RNA m each 
lane were prehybridized for 1 h at 68"C, then hybridized at 68*C with 
a 1.5 kb [a-^-PJdCTP-Iabeled probe that represented the coding region 
of ihe mouse epitheliasin cDNA. After low stringency washes, the 
blots were washed at high stringency at 50'C and auioradiographed. 

2.6. Production of antibodies asainst epitheliasin 

Rabbit polyclonal antiserum was raised to a synthetic peptide. 
cS--"HPNYDSKTKNND'^-\ located in the serine proteinase region 
of epitheliasin. The peptide was chosen based on predicted surface 
hydrophiliciiy and antigenicity. The peptide was coupled to keyholc- 
limpei hemocyanin. Subcutaneous injections were given to rabbits 
with 100 pg of conjugate thai was emulsified in Freund's complete 
adjuvant and then boosted with the same amount of antigen in 
Freund s incomplete adjuvant at 2 week iniervals until a tiler of 
> 1:4000 was obtained. The presence of anti-peptidc anlibodies was 
assessed by dot bloi analysis using the peptide linked to ovalbumin as 
the antigen. 

2. 7. Imnuoiohistofogy 

Mouse kidneys and lungs were fixed in buffered IO^/«. formaldehyde, 
and embedded in paraffin. Sections were cut at 5 pm depths, dcparafli- 
nized and rehydraied. Following antigen retrieval performed with I x 
Citra solution in a microwave oven for 15 min at 700-900 W. the 
samples were washed in PBS. Endogenous peroxidase aciiviiy was 
blocked with 2aK» mcihunol and ?% H.O: in PBS for 30 min at 
room temperature. The tissue was permeated using 10^/" Triton X- 
100 in PBS for 20 min at room temperature. Endogenous bioiin 
was blocked by Vecior Block tividin solution for 30 min at room 
temperature followed by Vector Blocking solution for 30 min al 
room temperature. The sections were then incubated with epitheliasin 
peptide ;inti-serum. dilution 1/500 in Block solution overnight at 4'C 
in a humid chamber. After washing with TNT. 1/500 horse anti-rabbit 
IgG serum in TNT was applied for 30 min at room temperature. The 
slides were then incubated with l/lOO SA-HRP in TNT for 30 min ai 
room temperature. The signal was amplified with bioiinyl tyramide 
for 5 min at room temperature. This was followed by a re-incubalion 
with l/lOO SA-HRP in TNT. The signal was visualized using VIP 
substrate solution. The same process was applied to the slides used 
as controls, but epitheliasin anti-scrum was replaced by non-immune 
rabbit serum. 



3. Results and discussion 

3.1. Cloning and analysis of the epitheliasin full-letigth cDNA 
Fig. J shows the nucleic acid and deduced amino acid se- 
quences of the complete cDNA rcconstiiuied from the RACE 
fragments. As demonstrated by the immunohistochcmisiry de- 
scribed in a following section, the encoded protein is highly 
expressed in epithelial tissue. Accordingly, we named ihe pro- 
tein epitheliasin. The composite cDNA spans 1753 ni. A 5' 
untranslated region (UTR) extends 100 nt. The first in-frame 
ATG (1-3 nt) was assigned as the codon for the Mel trans- 



£. Jacqtdnct et aUFEBS Letterj 468 (2000) ffg 

lation initiator since the sequence around this codoh (Xj 
GATGG) conforms to the Kozak consensus sequence If,^ 
mammalian protein biosynthesis 15J. A single open readiS 
frame begins with the ATG and extends 1470 nt. Thisj 
followed by a stop codon, TAA (1471-1473 nt) and a y 
UTR of 152 nt, terminating in a poly(A)+iaiI of 28 nl^ 
consensus polyadenylation site (ATT AAA, 1600-1605 nt)".l 
located 20 nt upstream of the poly (A)+lail. """^ 



CO 



3.2. Characteristics of the sequence and structural features o 
epitheliasin 



\'.: 



The open reading frame encodes a protein of 490 ami^^M7^ 
acids with a calculated molecular mass of 53 529 kDa. Con^M^^ § 
parisons with sequences in GenBank. EMBL and SWISS-'I'^Ij^.. q 
PROT reveal that the epitheliasin 

main serine proteinase. A typical amino-lerminal signal sc^^V 
quence is not present, but a hydrophobic region is prcscnrjj. 
near the amino terminus (Leu** to Trp'°*). This 22 amin()\ 
acid region is flanked by charged amino acids (Lys artd'l 
Arg) and corresponds to a transmembrane domain [6]. Based 
on the difference in total charge between the 15-residue sc. -: 
quences on either side of the membrane-spanning domain - 
epitheliasin can be classified as a type II integral membrane-: . 
bound protein [7,8] that has a cytosol facing amino-tcrminal 
tail region consisting of 83 amino acids (Met* to Ser*^) and aii . 
extracellular facing COOH-terniinal modular region. The ab- 
sence of a signal peptide and the presence of a transmembrane 
domain in epitheliasin are analogous to homologous serine, 
proteinases, enieropeptidase. a key enzyme in digestion thai' 
is responsible for the conversion of trypsinogen to trypsin [9], . 
hepsin, a membrane-associated proteinase involved in the (oi- ■'. 
maiion of thrombin on cell surfaces [10]. and a recently dc| 
scribed human airway trypsin-likc proteinase (II}. 

The predicted domain structure of epitheliasin is shown in" 
Fig. 2. A LDLRA domain extending from Cys"- lo Cys'*? 
and containing six cysteines follows the transmembrane dch 
main. This domain motif is found in a number of proteins 
that arc functionally unrelated to the UDLR family, including 
clotting proteinases and enlcropcplidasc. In each of these pro- . 
tcins the doniain is thought to function as a protein-binding 
domitin. The LDLRA domain in cpiihcliiisin is similar to 
other typical LDLRA domains that arc about 40 amino acids , 
long arid contain six cysteines (12). The cysteines form intra-'' 
doinain bridges resulting in a cluster of negatively chargtti 
residues in a single loop positioned for high affinity binding 
to positively charged sequences in LDLR ligands. 

Following the LDLRA domain, an SRCR-like domain exv 
tends from val"*^ to Gly-'*\ SRCR domains are classified into 
two groups, group A and B according to the number of con*^ 
served cysteine residues, six or eight, respectively [131. 
recent analysis, all but one of the 33 independent SRCR do-.: 
mains that had been previously identified had six or cigM^ 
cysteines [14]. An unusual feature of this domain in epithelig^ 



sin is that it contains only four cysteines. These cysteine 
idues in epitheliasin are completely conserved in positio^ 
suggesting that the domain belongs to group A. The SR^^^ 
domain that is closest to that in epitheliasin is prescnt^^ ^ 
complement factor 1 (CFl), a serum proteinase that regula^ 
the complement cascade by cleaving C3b and C4b. CFl jog^^-. 
tains a single SRCR domain with five cysteines (13]. 

The function of SRCR domains is largely unknown 
seems likely that most of these domains are involved in bj 



V 



g 



< 
< 



-8 

< 



t 




95 



.1-5 1 2 

Q *» 

£ e 

j= x» .E 

H g 

el's 0 
? E " " 

Is 11 

^.i 8 £ 
< .2 t ^ 

-£ :1 5 ^ 

A c: 

42 — .E ^ 

iS r- P 

c .E is 

E »> 5* *K 

o C o 
-o t> re 

£ .£ 5r 

<u . 5 o 
"O ^ < ^ 
s u .CO 

^ I 

4J , .2 

c ^ — .E 
** E 5 

^ t ° a 

— 3 'c: 

c r: ^ 
u re M 

ills 

» 2 § S 

§^ 61) ^ 
C C « 

•p re = 

•i ^ £ r 

U re 3 • r- 

- 8-8 I- . 

^ C O -o 
i- 1 ^ £ C 



96 

ing lo molecules on ihe cell surface or in the extracellular 
space. Direct evidence supporting the idea that SRCR do- 
mains mediate binding to other cell surface proteins or extra- 
cellular proteins has recently been provided [14.15]. 

3.3. Fearures of serine proteinase domain 

The proteinase domain begins with lle^** and represents the 
major domain (about 50%) of the encoded protein. The pre- 
dicted molecular mass of the domain is 25 892 kDa. The do- 
main contains all the major features conrmion to the SI family 
of the chymotrypsin (or SA) clan of serine proteinases. The 

Tabic 1 

Exon-iniron junctions organizaiion of cpilheliasin gene 



E. Ja equine t et aUFEBS Letters 468 (2000) 93I1 




residues contributing to the salient structural features in c^ 
moirypsin include: (!) His*', Asp'"-, and Ser'" that makc'^i 
the catalytic triad, (2) Gly*", Asp'^ and Ser»" that fonn;^ 
oxyanion hole required for' catalytic efficiency, (3) Scr^Mf 
Trp^'* and Gly^'* that bind the main-chain of a subsiraic^r' 
and (4) residues that occupy the bottom (Ser'*') and sia^ 
(Gly^*^ and Gly^*) of the substrate specificity pocket (S| 5ub! 
site). All of the residues contributing to the first three featur^ 
and the residues Gly^*^ and Gly^* on the sides of the sul?^ 
siratc specificity pocket of chymotrypsin are strictly conservS^g^^- : 
in epitheliasin. However, in epitheliasin the residue corre%^i Mfig- 




3* splice site 



Exon sixe 
in Amino acid 



S* splice site 



Phase 



-«CAA CAO OSraAQAAGCaCGCCO 



wi ' i nx.^rTc c TTCJia ore acc-" aac tca otaaotoctaattct 

TTTCCCATTCTTTAa CGG TCA aCC TCA A OJTAAGACTCCTTACC 

CTTTTCTTCCCGCAO AO TCT AGG TTC T oa^AOTTGGOGGCTG 

K*" S** <291 R"* F"** 

CCAATACAATGCCAO GG GAC AAC CG OTGTTCTGACTTATC 

TTCTTTCTCCTTCAO T TGT GTT TAG AA OTGAGTATGGAACCC 

vi4i (44) Y*** K**** 

TGTCTTTTTTTCCAa G AAC AAT CAC AG OrATOOAOTTTTTTC 

jjlfj i^ifi (37) H'** S'*' 

C '* i ' l '' i * l '' i * l XL'' l *l*TC C A a T GAC TCA TGT ATA G OTOACTGAGTACTTC 

^2% s»a» (14) c'*** 1'** 

GCTTGT CA CCCTCAO AA TGC GAA GA OTATGCCTCCATTCT 

Cl ' l^ ' tXjiXJ ' tVtXJJ UtO A CCC CTC TTT AAT G OTAOOTCACACTCAG 

CTCTTCTTTAAACAO AT CTA GAO AAA G OTGAGGCTTTOGGTC 

1>»ST j^i5i (32) eJ" k'" 



n 



n 



n 



TGOCTCTCTTCTTAO 



GG AAC TGC CAO aSTU^TTCTGAOTGOT 

q)c* ]^s»e (46) c*'* O*'* 



' mCl X/lV l 'lC t. 'C CA g GGA GAC ATO AGG CTTTATTTCCTCTATT 

TTCCTATTTGCACAa CCG AAC 



r'^r V ^Vtrer to 1 



2. 1 



,Jspondin 
^.residue, 
tvclcavagc 
^jpccifici* 
^^'i Com) 
:^:tcinaJic 
4 indicate 
Ti mouse * 
r-and nei 
7 chymotr 
vrSubsirab 
f pus inev 
:^^epitheliii 
Based 
"-predict 1 

• that is c 
:. the Arg 
*:-thc en2> 
f prolcolx 
---or the c 

- in epilhc 
sin is s\ 
-.intra cell* 
.. prior to 
. Arg-GIn 

• Ile-Val-< 
•teinase 

scqucnci 
oSin are 
-suggesiii 
" in proce- 
Based 
-uith oth 
indict thai 
-two chii 



A : 




S^g. 3. Sc 
the mo 



f 



I 




ryjaequinrt et aUFEBS Letters 468 (2000) 93-100 

O 



97 






(84-105) 



SRCRl 

(149-243) 




Serine Proteinase 

<254-«W) 




490 



--L^c^. 2 The domain organization of epilhdiasin. Starting at the NH,-terminus the epilhcliasin contains a TM domain followed « LDLRA 
rre-^f,^^Fi^ ^^.-^ ^ gj^^j^ domain, and Bnally the serine proteinase domain. A'-glycosylaiion sues are mdicated by a circle. The numbers m parentheses 
Epjjr to the amino acid residues of each domain. 



spending to Ser*'' of chymoirypsin is replaced by an acidic 
irrtidue. Asp, This suggests that epitheliasin has specificity for 
k'^clcavage after Lys or Arg, indicating a trypsin-likc substrate 
[Mspccificity for the enzyme. 
!. Comparison of the amino acid sequence encoding the pro- 
teinase domain in epitheliasin with other serine proteinases 
I ^indicates that this region of the protein shares identity with 
Jmouse enteropeptidasc (53%). hepsin (51%), acrosin (48%), 
^snd neurotrypsin (46%), all multi-domain members of the 
V'chymotrypsin family of serine proteinases with trypsin-like 
c-i$ubstratc specificity. The aforementioned CAP I from Xeno- 

■ pits laevfs kidney epithelial cells has a sequence identity with 
;* epitheliasin of 44%. 

Based on findings with related vertebrate trypsinogens we 
r predict that epitheliasin is synthesized as an inactive zymogen 
;- that is converted to an active serine proteinase by cleavage of 
• the Arg-'^'-lle--^ peptide bond in the extracellular domain of 
l.thc enzyme. Most vertebrate trypsinogens arc activated by 
"proteolytic cleavage of a Lys (Arg)-Ile bond. The identity 
-or the origin of the proteinase responsible for this cleavage 
in epitheliasin is not known. One possibility is that epiihelta- 
.sin is synthesized as a single-chain zymogen and undergoes 
intracellular cleavage and activation by a furin-Iike enzyme 
prior to insertion into the membrane. This is based on the 
Arg-Gln-Ser-Arc-*-* sequence that immediately precedes the 
. Ilc-Val-Gly-Gly-''' representing the NH^-termlnus of the pro- 
teinase domain. Arg-X-X-Arg motifs are furin recognition 
-sequences [16-20], Interestingly, all the domains of epiihelia- 
.-lin are flunked by recognition sites for furin-Iike enzymes. 

■ suggesting the need to clarify the role of furin-Iike enzymes 
-in processing of epitheliasin. 

Based on the structure of enteropeptidasc and a comparison 
.with other chymotrypsin-like serine proteinases, we also pre- 
'dict that epitheliasin. following intracellular cleavage, forms 
-two chains with the smaller chain containing the proteinase 
" domain, and the larger the membrane-spanning segmieni. and 



the LDLRA and SRCR-like domains that may serve as sub- 
strate recognition sites. Several chymotrypsin-like serine pro- 
teinases including enteropeptidasc -have a disulfide bond that 
covalently links the two chains [21). The proteinase domain in 
epitheliasin contains eight Cys residues in conserved positions. 
By comparison with chymoirypsin, three of the Cys pairs (42/ 
58, 168/182 and 191/220) that form disulfide bond loops 
around His", Met'*'*' and Ser"^ are conserved in epitheliasin. 
Although the other two cysteines (Cys*-- and Cys*^) are lo- 
cated in conserved positions, their pairing counterparts Cys' 
and Cys-*" that are involved in interchain disulfide bonds are 
absent. This suggests that epitheliasin is likely distinct from 
enteropeptidasc and other muliidomain serine proteinases in 
that it lacks disulfide bond(s) between the proteinase motif 
and the rest of the protein [22). Thus, the mechanism of asso- 
ciation of the two chains in epitheliasin is not clear. 

Three asparaginc-linked glycosylation sites are present in 
epitheliasin, Asn'" located at the beginning of the LDLRA 
domain of the protein.. Asn-'* located in the SRCR domain 
and Asn"*'"* located in the proteinase domain (see Fig. 1). 
Other features of the deduced primary structure of the protein 
include a cAMP- or cGMP-dependent protein kinase phos- 
phorylation site (Lys--**'-Ser-*-). Two protein kinase C phos- 
phorylation sites are present in the cytoplasmic domain 
(Thr"-Lvs'^ and Thr*'^-Lvs''- ). three in the SRCR domain 
(Ser'"-Arg'*^. Ser-''-Arc--*\ Ser"''-Arg--'''). one between 
the SRCR domain and the proteinase domain (Scr- 
Lys-"''^). and one in the proteinase domain (Thr"-*-Lys"*^' ). 
Three casein kinase 11 phosphorylation sites are present, two 
in the LDLRA domain (Ser"'-Glu"^. Ser"''-Glu"'). and the 
last one in the proteinase domain (Ser-'^'-Asp-*^). Finally, an 
ATP/GTP-binding site motif A is present in the proteinase 
domain of epitheliasin. from lie"' to Ala-^'*^. This motif is 
found in a number of proteins including those in the myosin 
and Rhs families. The relevance of these various sites in epi- 
theliasin is not presently known. 



tnlron 4 



tntron 0 tnUon 0 



'- V • - 

_k *.. -1. 
» *» . 



•100 



Intron 10 



tntfon 12 




latron i 



Inlron 9 



Intron 0 



Intron 7 



Uitron B 



tnUon 11 Invoo 13 



gfis. 3. Schematic representation of the genomic organization of epitheliasin. The intron placements are depicted in relationship to the domains 
5^Cthe mouse epitheliasin protein. The numbering represents nucleotides. 

t: 





98 

3.4. Genomic organization 

The cpilhcliasin gene contains 14 exons separated by 13 
inlrons (Fig. 3). The first exon is located in the 5' untranslated 
region. The last exon contains 9 bp of the coding sequence, 
the stop codon and the 3' untranslated region. The exon dis- 
tribution reflects the organization of the deduced protein. 
Exon 2 and 3, respectively 68 and 220 nt (M*-S"). encode 
for the cytoplasmic domain. Exon 4, 87 nl, (K^-F'°') encodes 
for the transmembrane domain. Exon 5, 117 nt, (D^^^'-R'* ) 
encodes for the LDLR domain (C'^-C'*'). An unusual fea- 
ture of epitheliasin is that the SRCR domain is encoded by 
three exons. 6-8, respectively 130 nt, 11 1 and 44 nt (C**^-I-^*). 
Usually SRCR domains are encoded by one or two exons, in 
regard to type B or type A, respectively. Exons 9-13, respec- 
tively 169, 176, 96. 143 and 153 nt, (E^^'-R*^) encode for the 
serine protease domain. Vertebrate serine protease-like genes 
have been grouped into five classes based on intron positions 
(23]. The gene organization of the epitheliasin protease do- 
main is typical of second group containing members of the 
trypsin family of serine proteases and consisting of five exons 
with each of the three components of the catalytic triad en- 
coded by sequences in a different exon. In epitheliasin, the 
catalytic histidine is located in exon 9, the aspartic in exon 
10 and the serine in exon 13. In general, the organization of 
epitheliasin is similar to that of other multiple domain serine 
proteinases. Each domain is coded in an independent manner 
by one or more exons. A common feature among all multi- 
domain protease cloned to dale is the five exons coding for 
the serine proteinase domain [24]. 

As shown in Table 1, all intron/exon junctions contain the 
expected GT splice donor and AG splice acceptor sites and 
conform to the consensus sequences established for intronic 
donor and acceptor splice signals [25]. Four introns are in- 
serted between codons (type 0 splice junction), five are after 
the first nucleotide in a codon (type I splice junction), and 
four after the second nucleotide codon (type II splice junc- 
tion), six bands were strongly positive by Southern analysis 
with sizes of 7000, 5000, 2700, 1400, 1200 and 900 nt. Adding 
the size of the fragments indicates that the epitheliasin gene is 
approximately 18 kb. 



4 





16C2 



Fig. 4. In situ hybridization of a bioiin-labcled epitheliasin probe to 
mouse meiaphasc cells. The chromosome 16 homologucs arc identi- 
fied with arrows. Specific labeling was observed at chromosome 
band I6C2. 



E. Jacquinei et aUFEBS Letters 46S (2000) 



I 



en cn tr 
I I 

:^''\v^iiX\'ii Heart 
Brain 



r. 



':K:0trr.'4i Spleen 
15 Lung 



. ..... o . 

. . 1..*— -k '•• .;■ 

■ yii^^f^ Uver 




; ' ^ <'^C-i^%pJ<:?i Skeletal Muscle 




Fig. 5. Northern blot analysis of epitheliasin mRNA in various 
mouse tissues. Each lane contained 2 pg of poly(A)+RNA. The bloi 
was hybridized to an epitheliasin cDNA probe. 



t . 



3.3. Chroniosoma! assignment 

FISH was p)erformed on normal mouse chromosomes using 
a BAC containing the genomic sequence of epitheliasin (Fig. 
4). These studies localized the epitheliasin gene to the tclomcr- 
ic region in the long arm of chromosome 16. The band local- 
ization was confirmed on G-banded chromosomes. The hy« 
bridization efficiency was 92.5%. No other serine proteinases 
have been localized to this region. The region is homologous 
with the so-called 'Down's syndrome region* of human chro- 
mosome region 2lq22.2 and 2lq22.3. 

3.6. E.xpression of epitheliasin tuRNA in vivo 

The in vivo distribution of epitheliasin mRNA was inves- 
tigated in adult mouse tissues by Northern blot analysis. As 
shown in Fig. 5, a prominent 2.8 kb transcript and a less 
prominent 1.5 kb transcript were observed in the kidney. Be- 
cause of preliminary results that suggest an alternative poly- 
adenylation site approximately 1.3 kb downstream from the 
initial polyadcnylation site, we believe that the weaker signal 
actually represents the characterized cDNA. A prominent 2.8 
kb signal was also seen in the lung and a weaker signal of 
similar size was observed in liver tissue. No signal was ob* - 
served in heart, brain, spleen, testis or skeletal muscle. Of • 
note, all tissues that express epitheliasin have epithelial cells 
as a prominent feature of their cellular makeup. 

3. 7. Immunohistocliemical locatization 

Fig. 6A shows the kidney in which only tubular epithelial," 
cells are stained with no staining of glomeruli. The staining 
restricted to cells located in distal tubules. The staining 
most intense at the apical pole of the cells, facing the lumenr 
of the tubules. The staining is faint in the cytoplasm, basaU 
and lateral side of the cells. Fig. 6B shows the lung in whi^ 
staining is primarily limited to the apical surface of airway^ 
epithelial celts. Staining is minimal or absent in the vascu^ 
ture and alveolar spaces. No staining was observed in contrw^ 
slides. Further analysis by in situ hybridization using a ^^^^Sg^ 
epitheliasin riboprobe demonstrated that the pattern of 
expression was the same as that of protein expression (daj* 



'.• Fig. 6. In 
:.• staining i> 
bronchial 
:•; (data nol 



- 1 



gnot shov- 
'^brane lot 
Durinr 
grand colic 
~4rapping : 
CDNA tl 
entity i 
Ipithclias 
cliasin 
Ipn of 
ntrast, 
'.ouse hv 




vj^^uinet et aUFEBS Utters 468 (2000) 93-JOO 



99 



:le 



inous * 
i blot 



(Fig. 
>mcr- 
local- 
e hy. 
nases 
>gom 
chro- 



inves- 
is. A$ 
=k less 
f. Be. 
poly- 
n the 
signal 
nt 2.8 
lal of 
IS ob- 

ie, or 

I celb 



ihelial 
ling is 
ing e( - 
lumeb':; 

basal 




'Fig. 6. Immunohisiochcmicul localizaiion of epiiheliasin in aduli mouse tissue. A: A seciion from the kidney (magnification 20 x ). Posuiyc 
.iiaining is seen in apical region of renal distal lubulc epithelial cells. B: A section from lung (magnification 20x). Positive staining is seen m 
-bronchial epithelial cells. No stain was obser\'ed in control sections in which normal rabbit scrum substituted for rabbit anti-mouse cpiiheliastn 
r(daia not shown). 



>ooi shown). These results support the epithelial and mem- 
^^•branc localization of epitheliasin. 

During the course of this iiivcstigation Paolini-Giacobino 
|iarid colleagues reported on a human cDNA cloned by exon 
^trapping named TMPRSS2 [3]. The portion of the TMPRSS2 
HcDNa that was reported has approximately 80% sequence 
g**cntity to epitheliasin. However, the tissue distribution of 
Hheliasin and TMPRSS2 is strikingly different. While epi- 
eliasin is highly expressed in the mouse kidney, no expres- 
on of TMPRSS2 was observed in the human kidney. In 
"*rast, no expression of epitheliasin was observed tn the 
^use heart or brain, while a high level of expression of 



TMPRSS2 was observed in human heart and an intermediate 
level in brain. Moreover, the size of epitheliasin of the mRNA 
transcript (2.8 kb) and thai of TMPRSS2 (3.8 kb) are differ- 
ent. Whether TMPRSS2 is the human orlhologue of epithe- 
liasin or a closely related gene product will require further 
study. 

The biological role of epitheliasin is not known. The ho- 
mology with CAPI and apical membrane distribution raise 
the possibility that epitheliasin may activate ion transport 
channels of the plasma membrane. In addition, cell-surface 
proteinases of normal and malignant cells are thought to 
play roles in cell growth, chemotaxis« cndocytosis. exocytosis, 



100 

blood coagulation, fibrinolysis and tissue invasion during 
metastasis [26]. While the function of the non-proteinase do- 
mains is unexplored, the presence of these domains with a 
modular organization represents a common feature of regu- 
latory serine proteinases (e.g. proteinases of the fibrinolytic 
and blood coagulation systems). Studies of the kinetic effects 
of deleting the non-proteinase domain from enteropeptidase 
clearly implicate it in the recognition of macromolecular sub- 
strates and inhibitors [21]. 

Acknowledgements: The work was supported by HL 50153 and HL 
37615 Trom the NHLBI. We gratefully acknowledge the assistance of 
Dr. Kurt Albcriine and Zhengming Wang for the immunohistochem- 
ical studies. GenBank accession number for epitheliasin nucleotide 
sequence: Bankli243070 AFl 13596. 

References 



[I] 

(2] 

[3] 

(5J 
161 
171 

m 



[101 



Baricos. W.H. and Shah. S.V. (1991) Kidney Ini. 40. 161-173. 
Vallet. v.. Chraibi. A.. Gaeaceler. H.P.. Horisberger, J.D. and 
Rossier. B.C. (1997) Nature .^89. 607-610. 

Paoloni-Ciacobino, A., Chen. H.. Peitsch. M.C.. Rossier. C. and 

Anionarakis. S.E. (1997) Genomics 44. 309-320. 

PinkeK D.. Siraume, T. and Grav. J.W. (1986) Proc. Null. Acud. 

Sci. USA S3. 2934-2938. 

Kozak. .M. (1986) Cell 44. 2S3-292. 

von Heijne. G. and Manoil. C. (1990) Protein Eng. 4. 109-112. 

High, S. (1992) BioEssays 14. 535-540. 

Semcnza. G. (1986) Annu. Rev. Cell Biol. 2. 255-313. 

Maisushima. M.. Ichinose. M.. Yahugi. N., Kakei. N., Tsukada. 

S.. Miki. K.. Kurokawa. K„ Tashiro. K.. Shiokawa. K.. Shino- 

miya. K.. Umeyama. H., Inoue. H.. Taiahashi. T. and Takaha- 

shi. K. (1994) J. Biol. Chem. 269. I9976-I9982. 

Kazama. Y.. Hamamoto. T.. Foster. D C. and Kisiel. W. (1995) 

J. Biol. Chem- 270, 66-72. 



E. Jacquinet el aUFEBS Letters 468 (2000) 9S^f( 

[II] Yamaoka. K., Masuda. K.-I.. Ogawa. H.. Takagi. K.-l., UniS 
moio, N. and Yasuoka. S. (1998) J. Biol. Chem, 273, 118951^ 
1 1901. :M 

[12] Sudhof, T.C., Goldstein. J.L.. Brown. M.S. and Russell. D\^' 
(1985) Science 228. 8I5-S22. 

[13] Resnick. D.. Pearson. A. and Kriegcr. M, (1994) Trends Bio^r 
chem. Sci. 19, 5-8. 

[14] Whitney, G.S., Starling. G.C.. Bowen, M.A., Modrell. B., Sia-^ 
dak, A.W. and Aruffo. A. (1995) J. Biol. Chem. 270. I8187-i 
18190. 

[15] Bowman, A. and Drummond. A.H. (1984) Br. J. Pharmacol 8l~ 
(4). 665-674. 

[16] Bresnahan, P.A.. Hayflick. J.S.. Molloy. S.S., and Thomas, Gj^ 

(1993) in: Mechanisms of Intracellular Trafficking and Process.^' 

ing of Proproteins (Loh. Y.P.. Ed.), pp. 225-250. CRC Press, ' 

Boca Raton. FL. 
[17] Creemers. J.W.M.. Siezen. R.J.. Roebroek. A.J.M.. Ayoubi, 

T.A.Y., Huylebrocck. D. and Van de Ven. W.J.M. (1993) 

J. Biol. Chem. 268, 21826-21834. 
[18] Hatsuzawa, K.. Nagahama. M.. Takahashi. S.. .Takada, K., . 

Murakami. K. and Nakavama. K. (1992) J. Biol. Chem. 267 

16094-16099. 

[19] Molloy. S.S.. Bresnahan. P.A.. Leppla. S.H.. Klimpcl. K.R. and 

Thomas. G. (1992) J. Biol, Chem. 267. 16.^96-16402. 
(20] Van de Ven. W.J.M. and Roebroek. A J M. (1993) Crii. Rev. 

Oncog. 4. 115-1 .36. 
[21] Lu. D.. Yuan. X. and .Xinulonc Z. S.J.E. (1997) J. Biol. Chem. 

272 (50). 3129.3-31.300. 
[22] Dclubiir. J.M.. Tehophile. D.. Rahmani. 2.. Chcllouh. Z., 

Blouin. J.L., Pricur, M.. Noel. B. and Sinei. P.M. (1993) Eur. 

J. Hum. Cenel. I. II 4- 1 24. 
(23) Irwin. D.M., Robertson. K..A. and MacGillivnw. R.T. (1988) 

J. Mol. BioJ. 200. 31-45. 
[24] Cool, D.E. and MacGillivrav. R.T. (1987) J. Diol. Chem. 262, 

1 3662- 1 3673. 

[25] Breathnach. R. and Chambron. P. (1981) Annu. Rev. Biochem. 
50. 349-383. 

[26] Bond. J.S. (1991) Biomed. Biochim. Acta 50. 775-780. 



FEBS 23: 



Abstract 
protein i 
protein t 
mitochoi 
tional p 
proteins, 
magnetic 
the cont 
TOM5. 
that is i 
forms a 
stnictur; 
© 20( 

Key \yor 
Nuclear 



'..*y 



1, Intro 

TOM 
membrii 
protein 
chondri: 
brane p 
portion 
TOMS 
TOM 2 2 
protein;^ 
quence 
the inci 
which I 
branc \ 
for nat 
Hnking 
with pr 
positive 
id surfa 
but littl 
interact 
... confom 
and ati 
tides. 



: - .1 



r.L E-mail: 



Also 



l*:"00l<i.57, 
Pll; SO 



Exhibit 17 



Proc. Natl. Acad, Sci. USA 

Vol. 91. pp. 7588-7592. August 1994 

Biochemistry 

Enterokinase, the initiator of intestinal digestion, is a mosaic 
protease composed of a distinctive assortment of domains 

(scfliBC protcsses/trypslDOgcn acCtratkui) 

Yasunori Kitamoto*, Xin YuANt, QiNGYU Wu*, David W. McCouRxt, and J. Evan SADLER*t* 

^Howard Hughes MedtcaJ Institute, 'Depaitments of Medicine and Biochemistry and Molecular Biophysics, The Jewish Hospital of St. Louis, Washingtoo 
University School of Medicine, St. Louis, MO 63110 

Communicated by EaH W, Davie, April 19, 1994 



ABSTRACT Enterokiiiase is a protease of the intestiiial 
brush border that specifically cleaves the acidic propeptide 
fhHD trn»inogen to yield active trypsin. This cleavage initiates 
a cascade of proteolytic reactions leading to the activation of 
many pancreatic zymogens. Tl>e ftiH-loigth cDNA sequence for 
bovine enterokinase and partial cDNA sequence for human 
enteroUnase were determined. The deduced amino add se- 
quences Indicate that active two^cfaain enterokinase Is derived 
from a single-chain prec ur sor. Membrane association may be 
mediated by a potential signal-anchor sequence near the amino 
terminus. The amino terminus of bovine raterokinase also 
meets the known sequence requirements for protdn N-myrls- 
toylation. The amlno-terminal heavy chain contains domains 
that are homologous to segments of Uie low density lipoprotein 
receptor, complement componoits Clr and Cls, the macro- 
ph^ scavenger receptor, and a recently described motif 
shared by the metalloprotease meprin and the Xenopm AS 
neuronal recognition protein. The carboxyl-termlnal light 
chain is homologous to the trypsln-Uke serine proteases. Thus, 
enterokinase ts a mosaic protein with a complex evolutkmary 
history. The amino add sequom surrounding the amino 
temdnus of the enterokhiase light chafai Is ITPK-IVGG (hu- 
man) or VSPK-IVGG (bovine), suggestfaig that slngle<hafai 
enterokinase Is activated by an unidentified trypsin-like pro- 
tease that deaves the indicated Lys-De bond. Thcrrfore, en- 
teroidnase may not be the ^'flrst" oizyme of the intestinal 
digestive hydrolase cascade. The specificity of entmklnase for 
the DDDDK-I sequme <rf trypsbiogen may be explained by 
complementary basic-aniino add residues clustered In poten- 
tial S2-S5 snbsltes. 



All animals need to digest exogenous macromolecules with- 
out destroying similar endogenous constituents. The regula- 
tion of digestive enzymes is, therefore, a fundamental re- 
quirement (1). Vertebrates have solved this problem, in part, 
by using a two-step enzymatic cascade to convert pancreatic 
zymogens to active enzymes in the lumen of the gut. The 
basic features of this cascade were described in 1899 by N. P. 
Schepovalnikov, worldng in the laboratory of I. P. Pavlov 
(2). Extracts of the proximal small intestine were shown to 
strikingly activate the latent hydrolytic enzymes in pancre- 
atic fluid. Pavlov considered this intestinal factor to be an 
enzyme that activated other enzymes, or a "ferment of 
ferments," and named it "enterokinase." The importance of 
this protease cascade is emphasized by the life-threatening 
intestinal malabsorption that accompanies congenital defi- 
ciency of enterokinase (3, 4). 

Enterokinase activates bovine trypsinogen by cleaving 
after the sequence VDDDDK, releasing an amino-terminal 
activation peptide (5,6). The acidic DDDDK sequence of the 
trypsinogen-activation peptide is conserved among verte- 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked ^'advertisement" 
in accordance with 18 U.S.C. 51734 solely to indicate this fact. 



brates (7), except for the similar sequences of trypsinogens 
from lungfish (lEEDK and LEDDK) and Afiican clawed frog 
(FDDDK). Enterokinase prefers substrates with the se- 
quence DDDDK, whereas the presence of aspartate residues 
markedly inhibits the ability of trypsin to cleave such sub- 
strates (8). For example, toward bovine trypsinogen the 
catalytic efficiency of enterokinase is 12,000-fold (porcine) 
(9) or 34,000-fold (bovine) (10) greater than that of bovine 
trypsin. This reciprocal specificity protects trypsinogen 
against autoactivation by trypsin and promotes activation by 
enterokinase in the gut. 

Enterokinase has been purified from porcine (11), bovine 
(10, 12, 13), human (14), and ostrich intestine (15). With the 
possible exception of human enterokinase, wliich was sug- 
gested to be a heterotrimer (14), enterokinase ^pears to be 
a disulfide-linked heterodimer with a heavy chain of 82-140 
kDa and a Ught chain of 35-62 kDa. Mammalian enteroki- 
nases contain 30-50% cart>ohydrate, which may contribute to 
the cq>parent differences in polypeptide masses. The heavy 
chain is postulated to mediate association with the intestinal 
brush border membrane (16), although no direct evidence for 
this function has been reported. The light chain contains the 
catalytic center. Based on susceptibility to inhibition by 
chemical modification of the active-site serine and histidine 
residues (9-11, 17) and on the partial amino acid sequence 
(18) and cDNA sequence of the bovine enterokinase light 
chain (19), enterokinase is a member of the trypsin-like fiamily 
of serine proteases. 

Enterokinase stands at or near the top of a regulatory 
enzyme cascade that successfully limits the activity of diges- 
tive hydrolases to the gut, but there is no structural expla- 
nation for enterokinase membrane localization, substrate 
specificity, or expression specifically in the proximal small 
intestine. To address these questions we have characterized 
cDNA clones for bovine and human enterokinase.' 

MATERIALS AND METHODS 

Materials. Purified calf enterokinase (EK-3, 131 unito/fig) 
was from Biozyme Laboratories (San Diego). Fresh bovine 
tissues were from a local abattoir. 

Amino Add Sequencing. Enterokinase (16 /ig) was reduced 
with 0.5% (vol/ vol) 2-mercaptoethanol, separated by elec- 
trophoresis (20), transferred to an Immobilon P membrane 
(Millipore) by electroblotting, and stained with (3oomassie 
brilliant blue. The excised light-chain band («47 kDa) was 
subjected to automated Edman degradation with an AppUed 
Biosystems model 470A sequencer (21) equipped with a 
model 120A phenylthiohydantoin analyzer. 



whom reprint requests should be addressed at: Howard Hughes 
Medical Institute. Washington University, 660 South Euclid, Box 
8022, St. Louis, MO 63110. 

^The sequences reported in this paper have been deposited in the 
GenBank data base (accession nos. U09859 and U09860). 



7588 



Biochemistry: Kitamoto et al 



Proc. Natl Acad, ScL USA 91 (1994) 7589 



Isolatioa of cDNA Clones. RNA was extracted (22) firom 
bovine duodenum and proximal small intestine. Single- 
stranded cDNA was prepared from total RNA (10 ftg) using 
avian myeloblastosis virus reverse transcriptase and an oli- 
go(dT) primer (cDNA cycle kit, Invitrogen). The cDNA was 
used for PCR amplification (30 cycles of 2-min annealing at 
59>*C, 2-min extension at IT'C, and l-min denaturation at 
with sense primer 5'-TAY GAR GGI GCI TGG CCI 
TCG GT-3' and antisense primer 5'-AAT GGG ACC CKXT 
IGA RTC ICC-3'. Products were analyzed by Southern 
blotting and hybridization with ^^-labeled oligonucleotide 
probe 5'-Sn WCI GCI GCC CAC TG-3' . The positive 572-bp 
product was cloned to yield pBEKl. 

Additional clones were ob taine d by radiolabeling the cDNA 
insert of pBEKl with P^JdCTP (23) and screening of bovine 
or human small intestine Agtll cDNA libraries (Qontech) or 
by using oligonucleotides to screen 5' rapid amplification of 
cDNA ends (RACE) libraries (24). RACE hl>raries were 
constructed with the 5' RACE system (GIBCO/BRL) using 
bovine intestinal RNA and one of two sets of enterokinase- 
specific primers: set 1, 5'-TTA TTG TCTF TCA TCA GAG 
CCA TC-3' . 5'-TGG ACA GTT TAA TTC TCC ATC ACA-3' . 
5'-ATC AAT TGC TAT GTA CTT TAG AGC-3'; set 2. 
5'-ATT GAG ACA TTT CCT GTG ATA TCA ATG CrrG-3'. 
5 -TGT GGA AAG TGA CCA GTT GGC TGG ATT TAT-3'. 
5'-GCC TTG AAT CAG TTC TTC TT-3'. DNA sequences 
were determined on both strands (25). 

DNA Sequence Analysis. Sequences were compared to 
GenBank and EMBL data bases at the National Center for 
Biotechnology Information using the BLAST network server 
(26). Sequence alignments and consensus sequences were 
prepared and analyzed with the programs pileih* and gap of 
the Genetics Computer Group (version 7.1, Madison, WI). 
The significance of gap alignments was evaluated by com- 
paring the optimal alignment score (x) to the mean (^i) and SD 
(a) of scores obtained for 30 alignments of randomized 
sequences, using the normal distribution to estimate the 
probability that the alignment could occur by chance. 

RESULTS AND DISCUSSION 

Isolation of cDNA Clones. The bovine enterokinase light 
chain was reported to contain the motif XEGAWPW^V at 
residues 8-16 (18); the underlined residues are not conserved 
in other serine proteases. Thirty-one residues of the amino- 
terminal sequence of the bovine enterokinase light chain were 
determined, and the previously reported sequence was con- 
firmed, except that arginine rather than tyrosine was identi- 
fied at cycle 8. This sequence was used to design a degenerate 
23-mer ** sense** primer that would be relatively specific for 
enterokinase. A degenerate 21-mer * 'antisense** primer was 
based on the conserved GDSGGPL motif that contains the 
active-site serine of serine proteases. Upon PCR with a 
bovine small intestine single-stranded cDNA template, the 
m^or product hybridized to a probe based on the conserved 
sequence near the active-site histidine. The corresponding 
clone pBEKl was used to isolate overlapping cDNAs from 
bovine and human small intestine cDNA libraries. 

The composite cDNA sequence for bovine enterokinase 
spans 3923 nt. Beginning at nt 113 there is an ATG codon and 
open reading fr'ame of 3105 nt, a stop codon plus 3' untrans- 
lated region of 643 nt, and a poly(A) tail of 63 nt. A poly- 
adenylylation signal of AATAAA is present 25 nt before the 
poly(A) tail. The open reading frame encodes a polypeptide of 
1035 amino acids with a calculated mass of 114.9 kDa. The 
translated amino acid sequence after residue 800 (Fig. 1) was 
identical to the 31 residues determined by Edman degradation 
of the enterokinase light chain, confirming that the cDNA 
encodes enterokinase. A segment of 81 nt ttmt encodes amino 
acid residues Ala-166-Pro-192 was present in three cDNA 



clones but absent in one (Fig. 1). This sequence is not 
delimited by splice sites and therefore may be encoded by an 
exon that is occasionally absent due to alternative splicing. 
This segment also could represent a length polymorphism. 

The partial cDNA sequence for human enterokinase cor- 
responds to amino acids 765-1035 encoded by the bovine 
sequence. In the region of overlap, the open reading frames 
of the bovine and bimian nucleotide sequences are '»85% 
identical, and the encoded amino acid sequences are «'84% 
identical. The 3' untranslated regions are less conserved, 
exhibiting «*67% sequence identity over 572 nt. 

By Northern blotting, an enterokinase mRNA species of 
»4.4 kb was detected in human small intestine, but not in 
leukocytes, colon, ovary, testis, prostate, thymus, spleen, 
pancreas, kidney, skeletal muscle, liver, lung, placenta, 
brain, or heart (data not shown). This result is consistent with 
the studies of Pavlov on the distribution of enterokinase (2) 
and the immunohistochemical localization of enterokinase in 
the brush border of duodenum and jejimum (27). 

Structore of the Enterokinase Catalytic Domain. In agree- 
ment with LaVallie et al (19), amino acid residues 801-1035 
correspond to the enterokinase light chain, which has a 
predicted mass of 26.3 kDa, compared with 47 kDa observed 
for purified bovine intestinal enterokinase (data not shown). 
The difference reflects glycosylation of the light chain. There 
are three and four potential N-linked glycosylation sites, 
respectively, in the bovine and human enterokinase light 
chains, and digestion of bovine enterokinase with peptide:N- 
glycosidase F reduces the apparent mass of the light chain 
fi^m 47 kDa to 35 kDa (data not shown). 

The enterokinase protease domain was compared with 
other serine proteases for characteristic disulfide bond pat- 
terns and sequence similarity. Enterokinase is most similar to 
a subfamily of two-chain serine proteases that share 10 
conserved cysteine residues and in which the activation 
peptide remains attached to the protease domain by a disul- 
fide bond. The archetype of this group is chymotrypsin. By 
analogy to chymotrypsin (28, 29) and related proteases for 
which the disulfide bonds have been determined directly, the 
most likely pairings in enterokinase are as follows: Cys-788- 
Cys-912. Cys-826-Cys-842. Cys-926-Cys-993. Cys-957-Cys- 
972, and Cys-983-Cys-1011. The first of these disulfide bonds 
joins the heavy chain and light chain. 

The amino acid sequence of the enterokinase protease 
domain is strikingly similar to the blood coagulation prote- 
ases factor XI (30) and prekaUikrein (31) and to hepsin, an 
unusual serine protease with a possible transmembrane do- 
main near the amino terminus (32). Enterokinase exhibits the 
expected conservation of serine protease sequence motifs; in 
particular, the active-site residues can be identified as His- 
841, Asp-892, and Ser-987 (Fig. 1). Compared with factor XI, 
hepsin, and chymotrypsin, the human enterokinase light 
chain has 41%, 44%, and 35% identical amino acid residues. 
The percentages for the bovine enterokinase comparisons are 
similar. Enterokinase and factor XI appear to share two 
potential N-linked glycosylation sites, whereas hepsin has no 
N-linked glycosylation sites. 

The specificity of enterokinase for cleavage after lysine is 
consistent with the presence of Asp-981 at the base, and 
Gly-1008 and Gly-1018 at the sides of the specificity pocket 
or SI subsite that binds the substrate PI residue (Fig. 1). The 
requirement for aspartate in the P2-P5 positions suggests that 
the surface of enterokinase should provide electrostatic com- 
plementarity to negatively charged side chains. Examination 
of the homologous three-dimensional structure of chymo- 
trypsin suggests that several exposed surface loops of enter- 
okinase (Fig. 1, segments a-d) might contact these substrate 
residues. Within these segments, there are a few positively 
chaiiged residues that are present in both bovine and human 
enterokinase but absent fr^om related proteases with different 



7590 Biochemistry: Kitamoto et al. 



Proc, Natl Acad, ScL USA 91 (1994) 



specificity for the P2-P5 substrate residues. In particular, the 
RRRK (human) or KRRK (bovine) sequences between res- 
idues 886-889 (Fig. 1, segment b) may interact directly with 
the aspartate residues in enterokinase substrates. 

The synthesis of enterokinase as a single-chain protein 
poses a conceptual problem because it indicates that **pn>en- 
terokinase** itself must be activated by proteolytic cleavage. 
The responsible protease could act on proenteroldnase in- 
tracellularly during biosynthesis or extracellulaily. Although 
the reaction could be autocatalytic, the participation of a 
separate protease seems more likely. In that case, enteroki- 
nase would not be strictly at the top of the digestive hydrolase 
cascade but would be in the second position at best. The 
amino-terminal isoleucine of the enterokinase light chain is 
preceded by Scr-Pro-Lys (bovine) or Thr-Pro-Lys (human), 
suggesting that enterokinase is activated by a trypsin-like 
enzyme. The identity and location of the proenterokinase 
activator may indicate another level in the control of diges- 
tion. 

Strnctnral Motifo of the Enterokinase Heavy Chain. The 
nucleotide sequence around the codon for Met-1 is 
AA AATGG . and that for Met-20 is GTCAIQT. Only the 
former sequence matches at both positions -3 and +4 the 
consensus sequence proposed for translation initiation in 
vertebrate mRNAs (33), suggesting that initiation at Met-1 is 



more likely. There is no ii>-frame termination codon within 
the available 112 nt of putative 5' untranslated sequence, so 
it is possible that the initiation codon remains to be cloned. 
However, initiation at Met-1 predicts a bovine enterokinase 
heavy chain of 800 amino acids with a mass of 88.6 kDa (Fig. 
1), and this is consistent with the ■=«763 amino acids and «'84 
kDa estimated by compositional analysis of purified enter- 
okinase (12). By SI>S/gel electrophoresis, the apparent mass 
of the heavy chain was *«'116 kDa, decreasing to "'82 kDa 
after removal of N-linked oligosaccharides with peptide:N- 
glycosidase F (data not shown). This decrease in mass is 
consistent with the reported carbohydrate composition of 
enterokinase (10, 12), and there are 17 potential N-linked 
glycosylation sites in the sequence of the heavy chain (two 
are concatenated) (Fig. 1). 

The hydrophobic 29-residue sequence from Val-19 through 
Val-47 could serve as a signal peptide. If it were not cleaved 
by signal peptidase, this segment could function as a signal- 
anchor sequence and account for the membrane association 
of enterokinase. The amino-terminal sequence also is com- 
patible with the substrate specificity of myristoylCoA:pro- 
tein N-myristoyltransferase (34), suggesting that Gly-2 may 
be myristoylated and thereby provide another mechanism for 
membrane targeting during biosynthesis. 

The heavy chain of enterokinase contains five domains that 
are related to four different structural motifs found in other 



EKbov MGSKRSVPSR HRSLTTVEVM PAVLFVILVA liCAGLIAVSW LSIQGSVKDA AFGKSHEARG TLKIISGATY NPHLQDKLSV DFKVLAFDIQ QHIDDIPQSS 100 
EIQ?OV NLKNBYKNSR VLQFEMflU VIFDLLFDQW VSDKNVKEEL IQGIEAHKflS QLVTFHIDLN SIDITASLEB-MTISPATTS ^™IPLA TPGBZBIBCP 200 
BKbov PDSRLCADAL KYIAIDLFCD GELNCPDGSD ECaBICATAC DGRFLLTGSS GSFEALHYPK PSBHISAVCR WIIRVNQGLS IQLNFDYPWT YYADVLNIYE 300 



EKbov GMGSSKILRA SLWSNNPGII RIFSNQVTAT FLIQSDBSDY IGFKVTYTAF NSKELNNYBK INCNFEDGFC FWIQDLNP DN EWERTQG3TF PPSTGPTFDH 

3 — ' " I ■ .. 

EKbov TPGBMGFYI STPTGPGGRR ERVGLLTLPL DPTPEQACLS FWyVMYGENV YKLSIBIBSD Q MMEKTIFQK EGNYQQNWNY GQVTiairVE PKV3FYGFKM 

EKbov QILSDIALDD TgLTYCI CBV S VYPEPTLVP TPPPELPTDC GGPHDLWEPB-ITFTSINFPN SYPNQAFCIW NLNAQKGKNI QLHFQSPDLE NIADWEIRD 

^ 3 • ' — — ' 

EKbov GECDDSLFIA VYTGPGPVND VFSTTNRKTV LPITDNMLAK QGFKAHnTG YGLGIPEPCK EDNFQCKDGE CIPLVNLCDG PPHCKDGS DE AHCVRLFBHT 

4 — ■ » 

LILTPS QQCLQDSLIR LQCNHKSCGK KLA. ,AQDIT 

TDSSGLVQFR IQSIWHVACA EWIIT ToisDD VCQLLGLGTG HfiBVPTPSTG GGPYVNLNTA PHMLILTPS QQCLEDSLIL LQCMYKSCGK KLV. .TQEVS 

CTT KIR. ...... 



EKhu 
BKbov 
FXI 
Heps in 
Chta 
Consensus 



EKhu 
EKbov 
FXI 
Heps in 
Chta 
Consensus 



EKhu 
EKbov 
FXI 
Hepain 
Chta 
Consensus 



400 
500 
600 
700 

798 



CGR RKL PV 

COY PAIQPVLSGL 

CG- K 



PKIVGGSNAK 
PKIVGGSDSR 
PRIVGGTASV 
DRIVOGRDTS 
SRIVNGEBAV 
PRIVGG-D-- 
A 

DHDIAMMHLE 
NNDIAHKHLE 
GYDIALLXLE 
SNDIALVHLS 
NHOITLLKLS 
-NDIAL-HLE 



BOAWPWWGL 
EQAWPWWAL 
RGEWFWC2VTL 
LORWPWQVSL 
PGSIWPHQVSL 
-O-WPWQV-L 



FKVaXXDYIQ 
MKVHUDYIQ 
TTVMXJDSQR 

SPLPLTEYIQ 
TAASFSQTVS 
— VNYTDYIQ 



YY. . .GGRLL 
YF . . . DDQQV 
HTTSPTQRHL 
RY. . .DOAHL 
. .QDKTGFHF 
-Y G-HL 



PICLPEENQV 
PICLPBENQV 
PICLPSKODR 
PVCLPAAGOA 
AVCLPSASDD 
PICLP Q- 



CGASLVSSDW 
CQASLVSREW 
COGSIIGNQW 
COGSLLSGDW 
CGGSLINENW 
OGGSL-S-DW 



PPPG RBCS IA 
FPPGRICSIA 
NVIYTDCWVT 
LVDGKICTVT 
PAAGTTCVTT 
P--G — C--T 



LVSAAHCVYG 
LVSAAHCVYG 
ILTAAHCFYG 
VLTAAHCFPE 
WTAAHC, , . 
WTAAHC-YG 



GWGTWY.QG 
GWGALIY.QG 
GWGYRKL.RD 
GMGNTQY.YG 
GWGLTRYTNA 
GWG Y--G 



RNLEPSKWTA 
RNMBPSKWKA 
. .VBSPKILR 
RNRVLSRWRV 
. . .GVTTSDV 
RN-E-SKW— 



TTANILQEAD 
STADVLQBAD 
KIQNTLQKAK 
QQACVLQEAR 
NTPDRLQQAS 
-TA-VLQEA- 



ILGLHMKSBLJSPQTVPRLI DEIVINPHY NRRRK 

VLGI*HMASBIlJSPQIETRH DQIVINPHY NKRRK 

WSQI LMQg E IKEDTSFFGV QEIIIHDQY KMABS 

FAO. . .AVAQ ASPHGLQLGV QAWYHGCYL PFRDPNSEEN 

WAGEFDQGS SSEKIQKLKI AKVFKNSKY NSLTI 

V-G— M -SP L-I --IVIN--Y- N 



669 



VPLLSNERCQ 
VPLLSNEKCQ 
IPLVTNEECQ 
VPIISNDVCN 
LPLLSNTNC, 
VPLLSNE-CQ 



.QQKPSYHXZ 
.QQKPEYHXT 

. KRYRGHKIT 
GADFYCaiQIK 
.KKYWOTKIK 
Y-G— IT 



ENHICAOYBE 
QOCVCAGYEA 
HKMICAGYRE 
PKMFCAOYPE 
DAMICAO. .A 
E-HICAGY-E 



OGIDSCQGDS 
OGVDSCQGDS 
GOK0ACKGDS 
GOIDACQGDS 
9GVSSCMGDS 
GG-DSCQGDS 
» • 



987 



EKhu GGPLMCQEN. . . .NRWFLAG VTSFGYK.CA LPNRPGVYAR VSRPTEWIQS FLH 

EKbov GGPLMCQQI. .. .NRWLLAG VTSPGYQ.CA LPNRPGVYAR VPRFTEWIQS FLH 1035 

PXI GGPLSCKHN. . . . EVWHLVG ITSWGEG.CA QRERPGVYTN WBYVDWILE KTQAV 

Heps in OOPPVCBDSI SRTPRWRLOO IVSWCTG.CA LAQKPOVYTK VSDPREWIFQ AIKTHSBASO MVTQL 

Chta GGPLVCKKN. ...GAWTLVG IVSV#GSSTCS .TSTPGVYAR VTALVNWVQQ TLAAN 

Consensus CGPL-C-EN- RW-L-G ITSWG CA L--RPGVYAR V— F-EWIQ- -L 

♦-d- * 

Fig 1 Translated amino acid sequence of enterokinase cDNA clones and alignment with other serine proteases. The aligned sequences 
include human enterokinase (EKhu). bovine enterokinase (EKbov). human factor XI (FXI). human hcpsin (Hepsin). bovine chymotrypsinogen 
A (Chta). and a consensus sequence. Numbering at right refers to the tiansUted sequence of bovine enterokinase. Cysteine rcsiducswe m 
boldface type. Potential N-linked glycosylation sites arc in boWface underlined type. The potential signal-anchor sequence is double undnhned. 
The potential alternative cxon is indicated by a dotted underline. Sequence motifs in the heavy chain are indicated by numbered undeilmcs. 
Segments of the prx>tease domain that may interact with substrate amino acids are indicated by lettered underlines (a-d). The cleavage site for 
zymogen acUvation (A), active site residues (♦), and residues in the specificity pocket or SI subsite (*) arc indicated below the consensus 
sequence. 



Biochemistry: Kitamoto et al. 



Proc. Natl Acad, Set USA 91 (1994) 7591 



A 




sn? 



flE? LDLR 
1 




Meprtn 

2 



Clr/s 
3 









m 






H D S 



LDLR 
4 



MSCR 

5 



Serine Protease 



B 



EKX<199-239> 
EK4(659-e93) 
LDLRl(6-46} 
LDLR2(47-B7) 
LDLR3 (88-126) 
LDLR4 (127-175) 
LDLR5 (176-214) 
LDLR6 (215-254) 
LDLR7 (255-296) 
Consensus 



1 

C . PPDSRLCA 
C . KEDNFQCK 
C.BRNEFQCQ 
C.KSGDFSCG 
C . SQDEFRCH 
C.GPASFQOf 
C . SAFEFHCL 
C , RPDEFQCS 
CEGPNKFKCH 
C EF-C- 



D.ALKYIAXD 
D. .GECIPLV 
D. .GKCISYK 
GRVNRCIPQF 
D . . GKCZSRQ 
S. .STCIPQL 
S. .GECIHSS 
O. .CNCIKG5 
S . . GECITU> 
D--G-CI 



LFCDGELNCP 
HLCDGPPKCK 
WVCDGSAECQ 
WRCDGOVDCD 
FVCDSORDCL 
WACDNDPOCE 
WRCDGGPDCK 
RQCOREYDCK 
KVCNMARDCR 
W-CDG--DC- 



52 

DGSDEDHKTC ATA 

DGSDE. .AHC 

DGSDESQETC LSVT 

NGSDE. .QGC PPKT 

DGSDE. .ASC PVLT 

DGSDEWPQRC RCLYVFQGDS SP 

DKSDE. .QW AVAT 

DHSDE..VGC VNVTL 

DWSDEPIKEC CTNE 

DGSDE C T 



EK2 (358-443) 
A5xen(646-7 27) 
Mepr inA ( 27 6 -3 60 ) 
MeprinB(261-346) 
Consensus 



EK2 (444>520) 
A5xen(728-B12) 
MeprinA(361-445) 
MeprinB (347-430) 
Consensus 



1 90 
YEKINCNF.. ..BDGFCFWI QDLKDDKEWE RTQGSTFPPS TGPTFDHTFG NESGFYISTP TGPGGRRERV GLLTLPLDPT PEQACLSFWY 

HSDLDCKFGW GSQKTVCNWQ HDISSDLKWA VLNSKTGP VQD.H TGDGNFIYSE ADERHEGRAA RLMSPWSSS RSAHCLTFWY 

TLLDHCDFEK . . .TNVCGMI QGTRDDADWA H.GDSSQPBQ VDHTLVEQ.C KGAGYFMFFN TSLGARGEAA LLESRILYPK RKQQCLQFFY 
SFMDSCDFEL . . . ENICGHI QSSQDSAIMQ RLSQVLSGPB NDHSNKGQ.C KDSGFFHHPN TSTGNGGITA MLESRVLYPK RGFQCVEPYL 
D-CDFE- NVCG-I QO — DDADWA RL--ST-PP- -DHT-V-Q-C K-SGFF — FN TS-G-RGEAA -L-SRVLYPK R-QQCL-FVfY 



91 



179 

YMYGENWKL SINISSDQNM EKT IF QKEGNYGQNW NYGQVTUIET VEFKVSFYGF . . . KNQILSD lALDDISL. . . .TYGICNV 

HMDGSHVGTli SIKLKYEMEE DFDQTL. . .W TVSGNQGDQW KEARWLHKT MKQYQVIVEG TVGKG.SAGG lAVDDIIIAN HISPSQCRA 
KMTGSPADRF BVWVRRDDHA GKVRQLAKIQ TFQGDSDHNW KIAHVTLNEE KKFRYVFUST KGDPGNSSGG lYLDDITL. . - -TETPCPA 
YNSGSGNGQL NVYTREYTAG KQDGVLTLQR EIRDIPTGSW QLYYVTLQVT EKFRWFBGV .GGPGASSGG LSIDDINL. . . . SETRCPH 
YM-GS-VG-L S R-D-N- -KD— L--I- T--GN-G-NW K-A-VTLNET -KFRWF-G- -GG-G-SSGG lA-DDI-L— ET-CPA 



D 



EK3 (540-619) 
Tollotd2 (468-550) 
Tolloid3 (624-712) 
Tollold4 (787-868) 
Clrl (18-95) 
Clr2 (193-274) 
Consensus 



CGGPHDLWEP 
CGGDLKLTKD 
CGGWDATKS 
CKFEI . . TTS 
. .SIPIPQKL 
CSS ELY. TEA 
CGCE TK- 



NTTFTSINFP 
QSI . DSPNYP 
NGSLYSPSYP 
YCVLQSPNYP 
FGEVTSPLFP 
SGYISSLEYP 
-G-U-SPNYP 



NSYPNQAFCI 
MDYMPDKECV 
DVYPNSKQCV 
EDYPRNIYCY 
KPYPNNFETT 
RSYPPDLRCN 
--YPN CV 



91 

EK3 (620-651) DV FSTTNRMTVL FITDNMIAKQ 

ToHoid2 (551-581) NI KTRSNQMYIR FVSDSSVQKL 

ToHoid3 (713-743) W NSEQSILRLE FYSDRTVQRS 

Tolloid4 (869-899) AV lASTNEMPMV LATDAGLQRK 

Clrl (96-135) LGNPPGKKEF KSQGNKMLLT FHTDPSNEEN 

Clr2 (275-306) DL DTSSNAVDLL FFTDESGDSR 

Consensus DV -S— N-M-L- F-TD-S-Q-- 



WHLNAQKGKN 
WRITAPDNKQ 
WEWAPPNHA 
WHFQTVLGHR 
TVITVPTGYR 
YSIRVERGLT 
W-I-AP-GH- 

130 

GFKANFTTGY 
GFSAALMLD. 
GFVAKFVID. 
GFKATFVSE, 
GTIMFYKGFL 
CWKUIYTTEI 
CFKA-F 



IQLHP.QEFD 
VALKF.QSFE 
VFLNF.SHFD 
IQLTF . HDFE 
VKLVP . QQFD 
LHLKFLBPFD 
V-L-F-Q-FD 



LEN 

LEX . . • • HDC 
LEGTRFHYTK 

VES HQE 

LEPS EG 

IDDHQQVH . . 
LE 



. lADWEIElD 
CAYDFVEIRD 
CNYDYLIIYS 
CIYDYVAIYD 
CFYDYVKISA 
CPYDQLQIYA 
C-YDYV-IYD 



GEGDDS . LFL 
GNHSDS.RLX 
KMRDHRLKKI 
GRSENS . STL 

DKKS L 

NGKN I 

G — D-S 



90 

AVYTGPGPVN 
GRFCGDKLPP 
GIYCGHELPP 
GIYCGGREPY 
GRPCGQLGSP 
GEFCGKQRPP 
GI-CG PP 



E 



EK5(694-782) 
HSCR(349-428) 
Speractl (43-121) 
Speroct 2 (153-232) 
Sp«racc3 (264-344) 
Speract4 (382-464) 
Consensus 



EK5(783-7e7) 
HSCR(429-451) 
Speractl (122-145) 
5peract2 (233-258) 
Speract3 (345-367) 
Speract4 (465-486) 
Consensus 



VRLFNGTTDS 
VRLVGGSGPH 
IRLIHGRTQ4 
LRMILGDVPN 
IRLMOGSGPH 
VRIV.GMGOG 
VRL--G-GP- 

91 

CNYKS 

CKIR.QWCTR 
CYHRPYGRPW 
CNMPVTPYQH 
CIUl.DGWTH 
CQMKV.SADM 
C-MR 



SGLVQFRIQS 
EGRVEILHSG 
EGSVEIYKAT 
EGTLBTFWDG 
EGRVEZWHDD 
QGRVEVSLGN 
EGRVEI-H-- 



IWHVACAENW 
QWGTICDDRW 
RWOGVCDWWW 
AWGSVCHTDF 
AWGTICDDGW 
GWGRVCDPDW 
-WG-VCDD-W 

116 



TTQISDDVCQ 
EVRVGQWCR 
HKENANVTCK 
GTPDGNVACR 
DWADANWCR 
SDHEAKTVCY 
DANWCR 



LLGLCTGMSS 
SLGYPCV. . . 
QLGFPGARQ. 
QMGYSRGVK. 
QAGYRGAVK . 
HAGYKWGASR 
QLGY-GC 



VPTFSTGGGP 
. .QAVHKAAH 
.... FYRRAY 
. . .SIKTDGH 
. . ASGFKGED 
AAGSAEVSAP 
S A- 



YVNLNTAPNG 
F.GOGTGP. . 
F . GAHVTT . . 
F.GFSTGP. . 
F . GFTWAP . . 
F.DLE.AP. . 
F-G--TAP-- 



SLILTPSQQC 
. . IWLNBVFC 
. .FWVYKMNC 
. .IILDAVDC 
. . IHTSFVMC 
. .FIIDGITC 

--I v-c 



90 

LB.DSLILLQ 
PGRB.SSIEE 
LC^E . TRLED 
EGTE . AHITB 
TGVE.ORLIO 
SGVENBTLSQ 
LG-E L-- 



AC . . SHSEDA 
LC. .NAQWAA 
ACPYTWWDV 
SC. .YKVEDA 
TO. . . ATGDV 
-C H--DA 



GVTCTL 
GVECLP 
GWCKP 
SWCAT 
owe EG 
GWC-- 



Fio 2 Stnictural motifs in enterokinase. Numbers in parentheses refer to the amino acid residues represented m each ahjpcd s^uence. 
Bovine enterokinase (EK) residues arc numbered as in Fig. 1. (A) Schematic structure of enterokinase, mdicatmg the proposed signaJ-anchor 
seaucnce (SA). alternative exon (AE). numbered heavy chain domains (LDLR, low-density-Upoprotcin receptor; MSCR. naacrophagc scavenger 
^^y^l^^^it^ domain with active-site residues histidine (H), aspartate (D). and serine (S). The cleavage site between the heay 
and Ught Chains (arrowhead) and disulfide bond connecting them are shown. (B) Alignment of EK domains 1 and 4 with cysteme-nch motifs 
of the LDL receptor (LDLR) (35). (C) Alignment of EK domain 2 with segments of Xenopus laevh AS antigen (A5xcn) (36), mouse rn^nn A 
(37) and rat mcprin B (38). (D) Alignment of EK domain 3 with selected Clr/s-likc domams of Drosophila melanogaster toUoid (39). and 
complement component Clr (40). (£) Alignment of EK domain 5 with repeated domains of the mouse macrophage scavenger receptor type I 
fMSCR) (41) and the speract crosslinking protein from sea urehin sperm (42). The significance of alignments was estunated as descnb^ under 
Materials and Meihodr, EKl or EK4 versus LDLR motifs, P < 10""; EK2 versus meprin motifs, P < lO'"; EK3 versus Clr/s motifs, P < 
10-"; EK5 versus MSCR motifs, /» « 3.7 x lO"'. 



7592 Biochemistry: Kitamoto et al. 



Proc, Natl. Acad, Set USA 91 (1994) 



protein families, indicating that enterokinase is a mosaic 
protein with a complex evolutionary history. The particular 
combination of motifs is specific and surprising (Fig. 2 A), 
Enterokinase domains 1 and 4 are homologous to an ^AO- 
amino acid cysteine-rich repeat found in the amino-tenninal 
domain of the low-density lipoprotein receptor and also in 
several complement proteins (Fig. 2B) (35). 

Enterokinase domain 2 (Fig. 2C) is homologous to ^110- 
amino acid segments of meprins A and B, which are mem- 
brane-bound metalloproteases of renal glomeruli (37, 38). 
This domain also is homologous to a segment of the A5 
protein of X, laevis (36), which may mediate neuronal rec- 
ognition. For this structtiral motif, identified in four distinct 
vertebrate proteins, we propose the name **meprin domain.** 

Enterokinase domain 3 (Fig. 2D) is homologous to a family 
of *»120-amino acid repeats reported in complement serine 
protease Clr (40) and subsequently found in many proteins 
including the product of the Drosophtla dorsal-ventral pat- 
terning gene toUoid (39). Interestingly, tolloid also encodes a 
separate metalloprotease domain that is homologous to the 
metalloprotease domains of meprins A and B. 

Enterokinase domain 5 (Fig. IE) is homologous to «»110- 
amino acid cysteine-rich motifs that are foimd in the macro- 
phage scavenger receptor (41), the sea urchin spermatozoa 
speract receptor (42), and several lymphocyte cell-surface 
antigens (41). This domain in enterokinase is truncated at the 
carboxyl end. 

The structural domains of the enterokinase heavy chain are 
found in proteins of the complement cascade, in endocytic 
receptors for diverse ligands including lipoproteins, in pro- 
teins that regulate development, in receptors that contribute 
to the specificity of egg fertilization, and in proteins of 
unknown function. The particular combination of structural 
motifs observed in the enterokinase heavy chain is unprec- 
edented. The presence of potential ligand-binding domains 
suggests that interaction with other macromolecules, either 
in the cell membrane or in the lumen of the gut, might 
modulate enterokinase activation, substrate specificity, or 
inhibition. 

For nearly a century enterokinase has been known as the 
principal activator of digestive hydrolases, and the same 
basic regulatory mechanism appears to be conserved among 
all vertebrates. The physiologic importance of this mecha- 
nism is emphasized by the severe malabsorption that accom- 
panies human enterokinase deficiency (3, 4). The apparent 
requirement for proteolytic activation of proenterokinase 
suggests that yet another protease is required for the normal 
regulation of pancreatic zymogens. The isolation of cDNA 
clones for human and bovine enterokinase provides the 
means to address the regulation and structure-function re- 
lationships of this ancient, essential protease. 

We thank Dr. Apja Scbwcizer and Dr. Jack Rorer (Washington 
University) for translations of articles from the original German » Dr. 
Heidi Hope (Washington University) for assistance with protein 
blotting, Lisa Westfield (Howard Hughes Medical Institute) for 
synthesis of oligonucleotides, and Cecil Buchanan (Washington 
University) for assistance in obtaining fresh bovine tissues. We also 
thank Prof. Tatsuo Sato (Kumamoto University, Kumamoto, Japan) 
for his encouragement and support. This research was supported, in 
part, by National Institutes of Health Grant HL14147 (Specialized 
Center of Research in Thrombosis). 

1. Neurath, H. (1984) Science 224, 350-3S7. 

2. Pavlov. I. P. (1902) The Work of the Digestive Glands (Charles 
Griffin, London), 1st Ed. 

3. Hadom, B., Tarlow, M. J., Uoyd, J. K. & Wolff, O. H. (1969) 
Lancet 1, 812-813. 



4. Haworth. J. C. Gourley, B., Hadom, B. & Sumida, C. (1971) 
y. Pediatr. 78, 481-490. 

5. Davie, E. W. & Neurath. H. (1955) /. Biol. Chem. 212, 515- 
529. 

6. Yamashina. I. (1956) Acta Chem, Scand. 10, 739-743. 

7. Bricteux-Gregoire. S., Schyns, R. & Florldn, M. (1972) Comp. 
Biochem, Physiol. B 42, 23-39. 

8. Abita, J. P., Delaage, M., Lazdunski, M. & Savrda, J. (1969) 
Eur. J. Biochem. 8, 314-324. 

9. Maroux, S., Baratti. J. & Desnuelle, P. (1971) J. Biol. Chem. 
246, 5031-5039. 

10. Anderson. L. E.. Walsh, K. A. & Neurath. H. (1977) Bioc^m- 
istry 16, 3354-3360. 

11. Baratti. J.. Maroux, S.. Louvard, D. & Desnuelle, P. (1973) 
Biochim. Biophys. Acta 315, 147-161. 

12. Uepnieks, J. J. & Light. A. (1979) /. Biol. Chem. 254, 1677- 
1683. 

13. Fonseca, P. & Light. P. (1983)7. Biol. Chem. 258, 14516-14520. 

14. Magee. A. I., Grant, D. A. W. & Hennon-Taylor. 3. (1981) 
Clin. Chim. Acta 115, 241-254. 

15. Naude, R. J.. Da SUva. D.. Edge, W. & Oelofsen. W. (1993) 
Comp. Biochem. Physiol. B 105, 591-595. 

16. Fonseca. P. & Light, A. (1983) J. Biol. Chem. 258, 3069-3074. 

17. Baratti. J. & Maroux, S. (1976) Biochim. Biophys. Acta 452, 
488—496. 

18. Light, A. & Janska, H. (1991) /. Protein Chem. 10, 475-480. 

19. LaVallie, E. R., Rehemtulla. A.. Racie, L. A.. DiBlasio, 
E. A., Ferenz. C, Grant, K. L., Light, A. & McCoy, J. M. 
(1993) y. Biol. Chem. 268, 23311-23317. 

20. Laemmli. U. K. (1970) Nature (London) 227, 680-685. 

21. Hewick. R. M., Hunkapillar, M. W., Hood, L. E. & Dreyer, 
W. J. (1981) /. Biol. Chem. 256, 7990-7997, 

22. Chomczynski, P. & Sacchi, N. (1987) Anal. Biochem. 162, 
156-159. 

23. Fcinberg. A. P. & Vogclstcin, B. (1983) Anal. Biochem. 132, 
6-13. 

24. Frohman, M. A., Dush, M. K. & Martin, G. R. (1988) Proc. 
Natl. Acad. Sci. USA 85, 8998-9002. 

25. Sanger. F.. Nicklcn, S. & Coulson, A. R. (1977) Proc. Natl. 
Acad. Sci. USA 74, 5463-5467. 

26. Gish. W. & States, D. J. (1993) Nature Genet. 3, 266-272. 

27. Hennon-Taylor, J . , Pcrrin. J . . Grant , D. A. W. , Appleyard, A. 
& Magcc, A. I. (1977) Gut 18, 259-265. 

28. Hartley, B. S. & Kauffman, D. L. (1966) Biochem. J. 101, 
229-231. 

29. Brown. J. R. & Hartley, B. S. (1966) Biochem. J. 101, 214-228. 

30. FMjikawa. K.. Chung. D. W., Hendrickson, L. E. & Davie, 
E. W. (1986) Biochemistry 25, 2417-2424. 

31. Chung, D. W.,Fi«ikawa,K..McMuUen.B. A. & Davie, E. W. 
(1986) Biochemistry 25, 2410-2417. 

32. Leytus, S. P.. Loeb, K, R., Hagen. F. S., Kurachi. K, & 
Davie. E. W. (1988) Biochemistry 27, 1067-1074. 

33. Kozak. M. (1991) J. Cell Biol. 115, 887-903. 

34. Rudnick, D. A., McWhertcr, C. A., Gokcl, G. W. & Gordon, 
J. I. (1993) Adv. Enzymol. 67, 375-430. 

35. Sudhof, T. C. Goldstein. J. L.. Brown, M. S. & Russell, 
D. W. (1985) Science 228, 815-822. 

36. Takagi, S., Hirata, T., Agata. K., Mochii. M.. Eguchi, G. & 
Fiuisawa. H. (1991) Neuron 7, 295-307. 

37. Jiang, W.. Gorbea, C. M., Flannery. A. V., Beynon, R. J., 
Grant. G. A. & Bond, J. S. (1992) /. Biol. Chem. 2CT, 9185- 
9193. 

38. Johnson. G. D. & Hersh. L. B. (1992) 7. Biol. Chem. 267, 
13505-13512. 

39. ShimeU, M. J.. Ferguson, E. L.. ChUds, S. R. & O'Connor, 
M. B. (1991) CeU 67, 469-481. 

40. Leynis, S. P.. Kurachi, K.. Sakariassen. K. S. & Davie. E. W. 
(1986) Biochemistry 25, 4855-4863. 

41. Freeman, M.. Ashenas. J.. Rees, D. J. O., Kingsley, D. M., 
Copcland. N. G,. Jenkins. N. A. & Kricgcr, M. (1990) Proc. 
Natl. Acad. Sci. USA 87, 8810-8814. 

42. Dangott. L. J.. Jordan, J. E., Bellet, R. A. & Garbcrs, D. L. 
(1989) Proc. Natl, Acad, Sci. USA 86, 2128-2132. 




Exhibit 1 8 



4562 Biochemistry 1995, 34, 4562-4568 

cDNA Sequence and Chromosomal Localization of Human Enterokinase, the 

Proteolytic Activator of Trypsinogen^ * 

Yasunori Kitamoto,* * Rosalie Ann Veile." Helen Donis-Keller," and J. Evan Sadler*'^^- 

Howard Hughes Medical Institute, Division of Human Molecular Genetics, Department of Surgery, 
Division of Hematology- Oncology, Department of Medicine, and Department of Biochemistry & Molecular Biophysics, 
The Jewish Hospital of St, Louis, Washington University School of Medicine, St, Louis, Missoun 63 J JO 

Received January 4, J99S^ 

abstract: Enterokinase is a serine protease of the duodenal brush border membrane that cleaves 
trypsinogen and produces active trypsin, thereby leading to the activation of many pancreatic digestive 
enzymes Overlapping cDNA clones that encode the complete human enterokinase amino acid sequence 
were isolated from a human intestine cDNA library. Starting from the first ATG codon, the composite 
3696 nt cDNA sequence contains an open reading frame of 3057 nt that encodes a 784 ammo acid heavy 
chain followed by a 235 amino acid light chain; the two chains are linked by at least one disulfide bond. 
The heavy chain contains a potential N-terminal myristoylaUon site, a potential signal anchor sequence 
near the amino terminus, and six structural motifs that are found in otherwise unrelated proteins. These 
domains resemble motifs of the LDL receptor (two copies), complement component Clr (two copies), 
the metalloprotease meprin (one copy), and the macrophage scavenger receptor (one copy). The 
enterokinase light chain is homologous to the trypsin-like serine proteinases. These structural features 
are conserved among human, bovine, and porcine enterokinase. By Northern blotting, a 4.4 kb enterokinase 
mRNA was detected only in small intestine. The enterokinase gene was localized to human chromosome 
21q21 by fluorescence in situ hybridization. 



Humr 




Enterokinase was discovered by N. P. Schepovalnikov, 
in 1. P. Pavlov's laboratory, as an activity of small intestinal 
mucosa that dramatically increased the proteolytic activity 
of pancreatic fluid (Pavlov, 1902). Enterokinase later was 
shown to be an enzyme (Kunitz, 1939) that cleaves the 
amino-terminal activation peptide firom trypsinogen to pro- 
duce trypsin (Davie & Neurath, 1955; Yamashina, 1956). 
This reaction permits the subsequent activation of other 
pancreatic zymogens by trypsin. The physiologic importance 
of this two-step proteolytic cascade is indicated by the 
intestinal malabsorption that is caused by congenital defi- 
ciency of enterokinase (Hadom et al., 1969; Ha worth et al., 
1971). 

Enterokinase has been piuified from bovine (Anderson el 
al.. 1977; Uepnieks & Light, 1979; Fonseca & Light, 1983), 
porcine (Baratti et al., 1973), human (Magee et al.. 1981), 
and ostrich intestine (Naude et al., 1993). In most prepara- 
tions, enterokinase appears to be a disulflde-Linked het- 
erodimer composed of an 82—140 kDa heavy chain and a 
35—62 kpa light chain, although a trimeric structure also 
has been proposed for human (Magee et al., 1981) and 



* Supported in pan by Nacional Institutes of Health Grants HL14I47 
(J.aS., Y.K.) and HG00469 (R.A.V., H.D.K.). 

'The DNA sequence (Figure 2) was deposited in the GenBank 
database under Accession Number U09860. 

* Address correspondence to this author at the Howard Hughes 
Medical Institute. 660 South Euclid Ave., Box 8022. St. Louis. MO 
63110. 

* Division of Hematology -Oncology, Department of Medicine.- 

* Present address: Third Department of Interna) Medicine, Kuma- 
moto University School of Medicine. 1-1 -1 Honjo. Kumamoto 860, 
Japan. 

'.''.Division of Human Molecular Genetics, Department of Surgery. 
HowardHughes Medical Institute and Department of Biochemistry 
and MolecularBiophysics. The Jewish Hospital of St. Louis. 
' " Abstract published in Advance ACS AbjtracU, ApriJ 1. 1995.*. * 



porcine (Matsushima et al., 1994) enterokinase. Both chains 
of mammalian enterokinases contain 30—50% carbohydrate. 

Recently, the full-length amino acid sequences of bovine 
(LaVallie et al„ 1993; Kitamoto et aJ., 1994) and porcine 
(Matsushima et al.. 1994) enterokinase and a partial sequence 
of human enterokinase (Kitamoto et al., 1994) were deter- 
mined indirectly by cDNA cloning. Active enterokinase 
appears to be a two-chain protein derived from a single- 
chain precursor. The puUtive heavy chain contains a 
hydrophobic potential signal-anchor sequence near the amino 
terminus, as well as several domains that are homologous 
to structural motifs found in other proteins. The light chain 
contains the catalytic center, and cnteroldnase is a member 
of the tiypsin-like family of serine proteases. 

Many facts remain unknown concerning the structure and 
function of enterokinase. Although enterokinase appears to 
be an intrinsic membrane protein, the mechanism of mern- 
brane association is unknown. Furthermore, single-chain 
proenterokinase is proteolytically cleaved to generate active 
two-chain enterokinase, but the enzyme that is responsible 
for proenterokinase acdvation has not been identified. 

To facilitate the study of human enterokinase membrane 
localization and zymogen activation, we have characterized 
cDNA clones that encode the complete amino acid sequence 
of human proenterokinase. These clones were employed to 
localize the himian enterokinase gene to human chromosome 
21q21 by fluorescence in situ hybridization. 

EXPERIMENTAL PROCEDURES 

Isolation ofcDNA Clones. The partial human enterokinase 
cDNA insert contained in plasm id pH EK6 (Kitamoto et aL; -'ll 
1994) was labeled with [^^PjdCTP by a random primer - 
method (Feinberg & Vogelstein, 1983) and employed tos^ 



*.A'. 



FiGURI 

indica' 
locatic 
and se 
disulfi 
of othc 
compc 
1991): 
neuror 
rich rr 
eight c 

screer 
noph: 
plaqui 
script 
seque 

DN 
GenB 
Biotet 
(Gish 
seque 
pileu|: 
(versi 
moto 

t/oi 
cDN/ 
was U 
A Noi 
humai 
1989) 
and V. 

2 X : 

citrate 
condii 
The b 
was s 
SDSf 
with 1 
above 
Get 
situ h 
al., 19 
prepaj 
ripher 
karyoi 

» Ab 

■citrate 
dodccy 





Human Enterokinase 



sn? 



LOLR Cir/s 
t 2 



MBM 
3 



CIr/s 
4 



LDLR 
S 



MSCR 
6 



Biochemistry, Vol, 34, No. 14. 1995 4563 



s-s 



» j H 

4 s 



Serine Protease 



HEKZ7 



HEKI9a 



.ns 
te. 
ne 
ne 
ice 
er- 
ise 
:le- 
. a 
ind 

3US 

ain 
ber 

ind 
3 to 
tm- 
lain 
rive 
ible 

^e 
icd 
:Dce 
d to 



i 

'5 



HEK18 



HEKI2 



HEIC6 



HEK3 



1 kb 



HEKI - 
HEKS 



Figure 1: Donoain structure of human enterokinase and map of enterokinase cDNA clones. The structure of the enterokinase cDNA is 
indicated schematically at the top. The 5' and 3' untranslated regions are indicated by thin lines ( — ) at the extreme left and right ends. The 
locations are indicated for a proposed signal-anchor domain (SA) and serine protease domain with active site histidine (H), aspartate (D), 
and serine (S) residues. The locations arc shown of the cleavage site between the heavy and light chains (arrowhead) and of the predicted 
disulfide bond that connects them. The enterokinase heavy chain contains repeated motifs (numbered 1 —6) that are homologous to domains 
of other proteins: LDLK, a low-density lipoprotein receptor cysteine-rich repeat (Sudhof ct al., 1985); Clr/s, a repeat type found in complement 
components Clr and Cls (Leytus et al.. 1986) and also found in the Orosophila dorsal*— ventral patterning gene toUoid iSUxmeW et al., 
1991): MAM, a domain homologous to members of a family defined by motifs in the mammalian metalloprotease meprin. the X. laevis 
neuronal protein A5. and the protein tyrosine phosphatase^ (Beckmann Sl Bork, 1993); MSCR* macrophage scavenger receptor cysteine- 
rich motif (Freeman et al., 1990) also found in sea urchin spermatozoa speract receptor (Dangott et al., 1989). The relationships among 
eight ovedapping cDNA clones are indicated. The scale in Idlbbases (kb) of DNA is indicated at the bottom left 



screen a human small intestine cDNA library in the bacte- 
riophage ^gtll vector (Clontech). The cDNA inserts of 
plaque-purified isolates were subcloned into plasmid pBlue- 
script M13-f or pBluescript II KS+ (Stratagene) for DNA 
sequencing (Sanger et al., 1977). 

DNA Sequence Analysis, Sequences were compared to 
GenBank and EMBL data bases at the National Center for 
Biotechnology Information using the BLAST network server 
(Gish & States, 1993). Sequence alignments and consensus 
sequences were prepared and analyzed with the programs 
pileup, gap. and pretty of the" Genetics Computer Group 
(version 7.1, Madison, WI) as described previously (Kita- 
moto et al., 1994). 

Northern Blotting. The insert of human enterokinase 
cDNA clone HEKI or human /?-actin (Gunning et al., 1983) 
was labeled with I^^ldCTP (Feinberg & Vogelstein, 1983). 
A Northern blot of poly(A)4- RNA (Clontech) from assorted 
human tissues (2 /ig/Iane) was hybridized (Sambrook et al., 
1989) with the radiolabeled HEKI insert (1 x 10^ cpm/mL) 
and washed three times for IS min at room temperature in 
2 X SSC and 0.05% SDS (1 x SSC is 15 mM sodium 
citrate, pH 7.0, 0.15 M NaCl). The final stringent wash 
condition was 50 °C, 15 min, in 0.1 x SSC and 0.1% SDS. 
The blot was exposed to X-ray film for 10 days. The blot 
was stripped of radiolabeled HEKI by immersion in 0.5% 
SDS for 10 min at 100 .**C. The strippeid blot was hybridized 
with the radiolabeled ^-actin probe, washed as described 
above, and exposed to X-ray film for 2 h. 

Gene Mapping by in Situ Hybridization, Fluorescence in 
situ hybridization was performed as described (Lichter et 
al,. 1988). Human prometaphase chromosome spreads w6re 
prepared from cultured phytohemagglutinio-stimulated pe- 
ripheral blood leukocytes firom a male with a normal 
karyotype (46XY). Extended chromosomes were produced 



* Abbreviations: kb, kilobase; nt. nucleotide; SSC, standard saline 
^itrate (IS mM sodium'cttrate. pH 7.0, 0.15 M NaCI); 'SDS, sodium 
'^^.'.dodecyl sulfate. .... 



by colchicine treatment (Yunis, 1976). Plasmids pHEKl and 
pHEK6 contain the human enterokinase cDNA inserts of 
bacteriophage Agtl 1 isolates HEKI arid HEK6, respectively, 
cloned into plasmid pBluescript Ml 3+. Equal amounts were 
mixed of pHEKl and pHEK6, and ftJl50 ng of DNA was 
labeled with biotin-1 1-dUTP by nick translation (Rigby et 
al., 1977). The biotinylated product was hybridized to human 
chromosomal spreads (Lichter et al., 1988). To detect sites 
of hybridization, slides were incubated sequentially with 
fluorescein isothiocyanate-conjugated avidiri DCS (5 ^g/mL) 
and fluorescein isothiocyanate-conjugated goat anti-avidin 
D antibodies (5 ^g/mL), followed by counterstaining with 
4,6-diamino-2-phenylindole dihydlrochloride (200 ng/mL) 
and propidium iodide (200 ng/mL). After fluorescent 
hybridization, cytogenetic banding patterns were visualized 
by staining with Giemsa. 

RESULTS AND DISCUSSION 

Isolation qfcDNA Clones. A human small intestine Agtl 1 
cDNA Ubrary was screened with the insert of a pardal human 
enterokinase cDNA clone, HEK6 (Kitamoto et al., 1994). 
Seven positives were identified among 1.5 x 10* plaques 
screened. Clones HEK12, HEKI8, and HEK19a were 
characterized further by restiicdon mapping and sequencing 
(Figure 1). The cDNA insert of HEK19a was employed to 
rescreen the library, and the longest clone obtained (HEK27) 
was sequenced. 

The composite cDNA sequence of human enterokinase 
(Figure 2) was determined on both strands. Beginning at nt 
41 there is an ATC codon and open reading fiame of 3057 
nt, followed by a stop codon and 3' noncoding region of 
599 nt. The open reading frame encodes a polypeptide of 
1019 amino acids with a calculated mass of 1 12.9 kDa. The 
coding regions of the human and bovine (Kitamoto et al., 
1994) nucleotide sequences are ^85% identical,- and the 
encoded amino acid sequences are ^82% identical. The 3' 
npDCoding regions are -less conserved, .with rs67% identity 
between human and bovine enterokinase cDNA sequences 



4564 Biochemistry. Vol 34, No. J4, J995 



Kitamoto et al 



ACCAGACACT TCTTAAAVTA CCAAGOCTTC AAAACCAAAA 



TTTCCACCTC 
P A A 

ACAGTCATGA 
Q 5 H E 

TGACCTTCAC 
O L O 

CTCCTATTTC 
V V F 

CTTTCCATAT 
T P H I 

TTCAACTOCT 
S S P 

TCTGCCACAG 
CAT 

ACTGGATCAT 
Q W I I 

ATCAAGCAAG 
S S K 

GAAAGTGATT 
C S D 



TCTTTOCCAT 
O P A Z 

AOCCAiGAGCC 
A R A 

CAAATGATAG 
Q M I 

ACCTTTTCTT 
D L F P 

TGATTTGAAC 
O L N 

TOTACraATG 
C T O 

TTTCTCATCG 
V C -D G 

ACCTGTAAAC 
R V N 

ATriTAACAG 
I L R 



ATTCGTAGTG 
L V V 

ACATTTAAAA 

T r K 

ATGAGATCTT 
D E I P 

TCCCCAGTCC 
A O H 

AOCGTTQATA 
S V D 

CTCTAACOTC 
A L T C 



GTTTCTCGCT 
C F W V 

CAATGCTTCA 
HAS 

CCTTCCCTTA 
ACL 

ACCAAGCAAA 
X B G H 

GACTGATATT 
5 D Z 

CCTACGCACT 
P T D 

GGATTTTAAA 
W X L H 

AGCTGATTCC 
AOS 

GrPGTTGGCAA 

V L A 

ACTGTCTTCC 
E C V P 

CAATCGTTTA 
N G L 

CTAGGGAGTG 
U G S 

GTCAACAGTC 
5 O O C 

ACCAAGTAAT 
G S N 

CTCTCCGCCC 

V S A 

CACTCCC TCC 
T V P R 

TTACACAGAT 

Y T D 

CAAGGTACTA 
OCT 

TGATATGTGC 
MICA 

TOTGACCTCA 

V T S 

CGCAITICTT 
AAACGTTTTT 
AAAATACCAT 
TTTAAAGGGA 
CTGTTCCAAA 
ACATATCCAT 



CCAGGATCTA 
O O L 

GOATITTACA 
O P Y 

GTTTPCTGGTA 
S F W Y 

TTATCCACAC 
Y G D 

GCGTTGCATG 
A L D 

GTGGAOGACC 
C G C P 

TGCACAAAAA 
A 0 K 

TTGCTCTTAC 
L L L 

GAGCAGCCTT 
R G C F 

ACTCCTGAAT 
L V N 



GTGCGGTTCA 

V R F 

CAAACTCATC 
C N S S 

TTTACAGGAT 
L O D 

GCCAAACAAG 
A K E 

CACACTGCCT 
A H C V 

ATTAATACAT 
LID 

TACATACAAC 

Y I O 

CTGCAAACAT 
T A N I 

AGGCTATGAA 

G y E 

1TTGCATACA 
F G Y 

AAACTAAACA 
ATTCTTACCT 
TAAATACATT 
TtSTTATTTTA 
TA AACP GCCA 
AACCTTAC7TA 



AAGATTTTTC 
R F L 

CAAGGACTTT 
O G L 

CTTCTATTTG 
A 8 3 W 

TAATGCAACA 
NAT 

AATGATGATA 
HDD 

TTTCTACCCC 
I S T P 

TCATATCTAT 
H M Y 

AATTGGAATT 
N W N 

ACATTAOCCT 
D I S L 

TTTTCACCTG 
PEL 

GGAAAGAATA 
G K N 

CTCrrCTACAC 
A V Y T 

TAAAGCAAAC 
KAN 

CTCTCTGACG 
LCD 

GAATCCAGAG 
R X 0 S 

AAAGCCAATC 
K P X 

TCCTTGATTC 
S L I 

GGGCCrOGCC 
GAWP 

GTATGOCAGA 
Y G R 



CAAATTGTCA 
E I V 

CTATTTGTTT 
P I C L 

ATTGCAACAA 
L O E 

GAAGCAGGAA 
EGG 

AGTGTGCCCT 
K C A L 

GCAAAGTCOC 
ATGTCAATGA 
TCTATTPCAT 
AACCATATAC 
TAAATTTTCT 
TTTTCCCAAC 



CTC' I IIT O CTO 
L ' C A 

TAACATCCGG 
I T S G 

TCTATCAACC 
.L S S 

CTGTCAGATC 

V S D 

TCCTAGACAA 
Z L D K 

TATAAAAGCT 
I K A 

1TAACTGCAT 
L T G 

CCATTAAACT 
S X R L 

GGAAACTAAT 
B T N 

TATACTGCAT 

Y T A 

ATGAATGGGA 
N E W E 

AACTOGACCA 
T G P 

GGTCAAAATO 
GEN 

ATGGACAACT 
y G O V 

AACATATOGG 
T Y C 

TGGGAGCCAA 
W E P 

TACAACTTCA 
I O L H 

AGGGCCTGGC 
G P G 

TTTACTACTG 
F T T 

QTCATCTGCA 
G H L H 

CATATGGCAT 
I W H 

TTCTCTACCG 
F S T 

GGTTACAGTG 
R L O C 

CTGGGTTGTG 
W V V 

AACTTAGAGC 
N L B 

TAAACCCTCA 
Z H P H 

ACX^GGAAGAA 
PEE 

GCTGATGTTC 
A D V 



ATOGCCTTCOA 
N G S 

GATTAATTCC 
G L X . A 

ACTTACATAT 
• V T Y 

AATCTGAAGA 
N L K 

AAAATGTAAA 
O N ' V K 

CKTAACAACC 
L T T 

GATTTATTTT 
D L F 

CATCTGCGTC 
S S C S 

GAGCTTCGAT 
S F D 

CCTGGCACAA 
P C T 

TTAACAGCAC 
P N 5 S 

AAGOATTCAG 
R I Q 

CGAGCGAGAC 
G G R 

TCCATAAATT 
V H K L 

AACCCTAAAT 
T L H 



ATTPGCAATG 
X C H 

ATACAACATT 
N T T F 

TTTTCAAGAA 
FOE 

CCAGTAAAGG 
P V K 

GCTATCACTT 
G Y H L 

CTGTGAGCAT 
C B D 

ACACCTTGTO 
T A C 

ATCCTGGACC 
O G G P 

TAACCATAAA 
N H K 

GCTCTGTATT 
C L Y 

CATCCAACTG 
P S K W 

TTACAATAGG 
Y N R 

AATCAAOTTT 
N Q V 



TACATTCTTC 
I O S C 

GCCTAATCGC 
P N R 

ATTATTTTCC 
AATCCTAGGG 
TGTGAACAGO 
CATACACTTA 
ACTTCCACTT 
CAATAGAAAC 



CTCTTCTATC 
P L L S 

TCAGCGCGAT 
O O D - 

CCCCGAGTGT 
P G V 

CATTCTACrrC 
GGCXAGCGAA 
TATTTCTTCA 
ACAAATTTGA 
T A G ' A ' J ' i O CTG 
TATTTATTXIT 



AAACACCCAT ATCTTCTACC 
K R G X S S R 

ACTATCCTOC CTGACAATCA 
V S C L T I 

AATCCTAATT TGCAAGACAA 
N P N LOOK 

ATCAATATAA QAACTCAAGA 
N B Y K HSR 

AGAAGAACT6 ATTCAAGOCC 
BEL I O G 

ACCAGTCATC TGGCAACTCC 
T S H L A T P 

GTGATGGAGA AGTAAACTGT 
C D G E V N C 

TTTCCACCCT ACTCATTATC 
F O A THY 

GATTTTAATA CATATTATAC 
D F N T Y Y T 

TAAGAATTTT TTCCAACCAA 
I R I P S N O 

TGAGCTTAAT AArTATGAOA 
£ L H N Y B 

GGAAGCACCT TTTCTCCTTT 
GST F S P P 

AACAACGACT CGGGCTTTTA 
O E R V G L L 

AACCATTAAT ATCACCAATG 
SIN I S N 

CAAACAGTTA AATTTAAGGT 
E T V K F K V 

GOAGTCTTTA TCCACAACCA 
C S L Y PEP 

CAOTTCTACG AACTTTCCAA 
S S T N F P 

TTTGACTTAG AAAATATTAA 
F O L B N I N 

ATGTGTTCTC TACCACCAAC 
D V F S T T N 

COOGATTCCA GAGCCATGCA 
GXP E P C 

COCTCAGATG AACCAGATTC 
G S D E A D C 

CTCACAACTC GACCACCCAG 
A E N V/ T T Q 

ATTTGTCAAA TTAAACACAG 
F V K ' L N T 

TCTTCTGGAA AAAAACTGGC 
S C G K K L A 

ATCGCGGCCG ACTCCTCTGC 
Y C G R LLC 

GACAGCAATC CTACGCCTGC 
TAX L G L 

CGAAGAAAGG ACAACGACAT 
R R K D N D X 

TTCCTCCAGG AAGAAATTGT 
F P P G R N C 

AAATGAGAGA TGCCAACAGC 
N E R COO 

TCAGGAGCAC CATTAATGTO 
S C G P L M C 



CATCATTCTC 
H H S 

AOGAATCCCA 
K B 5 0 

ACTCTCACTG 

L S V . 

CnTTTACAAT 
V L Q 

TTGAAOCAAA 
LEAN 

AGGAAATGTC 
G H V 

CCAGATCGTT 
P D G 

CAAAACCTTC 
P K. P S 

AOATATATTA 
OIL 

CTTACTGCCA 
VTA 

AAATTAATTO 
R X H C 

TACTGGACCC 
T G P 

AGCCTCCCTT 
S L P 

ACCAAAATAT 
O O N M 



TCACCTCXTTA 
L S S Y 

acgagotoca 

RCA 

CATTTCAAAG 
D P X 

TTGAAAATGO 
FENG 

TAAATCCAGC 
K S S 

TCAATAGAGT 
S I E 

CTGACGAAGA 
S D E O- 

TGAAACAAGT 
E T 5 

GATATTTATO 
D I • Y 

CCTTTCTTAT 
T F L X 

TAACTTTGAO 
N F E 

AATTTTGACC 
H P O 

TGCACCCCAC 
L D P T 

GCACAAGACA 
E K T 



ATGCCAGCGT CTCAAOGTTT 
Y A R V S R F 

TAGAAACCAT GGAAATTAAG 
ACAA AATT TT AAAA ATAATA 
CAGATCTCAT TTTTAAAATT 
GCAGAATTTA AAAAAGAAAG 
CTATTAGCAC AAACTCAATT 
AAGCTTATCT CACAGGCCTG 



TGCTTTTAAT 
A F N 

ACTTTX3GTGC 
T L V 

ACAGCTACCC 
N S Y P 

CGATGTAGTT 
D V V 

ACAATGACTG 
R M T 

ACOCAGACCA 
K A D H 

TGTCCGTTTT 
V R F 

ATPTCAAATC 
I S N 

CACCTGATGG 
A P D G 

AGCTCAAGAC 
A Q O 

CGCCCATCTC 
GAS 

ATATGAAATC 
H H K S 

TCCCATCATQ 
A N H 

TCTATTOCTG 
6 I A 

ACATOCCAGA 
0 M P E 

CCAAGAAAAC 
O E N 

ACCCAATGCA 
TEW 

TGTTTCGTAC 
AAATTCACCA 
CTTAA'PGATT 

orn^CTCTc'* 

GACTAAA'ITU 



GCTTTTAAAA 
A F K 

CAACTCCTCC 
P T P P 

TAATCTGGCT 
NLA 

GAAATAAGAG 
E I R 

TGCTTCTCAT 
V L L I 

TTTTCAATCT 
F O C 

TTCAATCCCA 
F N G 

ATCTTTCTCA 
D V C Q 

CCACTTAATA 
H L t 

ATCACCCCAA 
I T P 

TCCTCAGCAG 
L V S S 

AAATCTGACC 
H L T 

CATCTGGAAT 
H L' B 

CTTGCGGGAC 
G W G T 

ATATAACATT 
Y N I 

AACAGGTGGT 
N R H 

TACAAACTTT 
Z O S F 

AAAAATTTTA 
TAGCAATACA 
ATTTTTATTA 
GTTTTTCCCA 
CTTTTCTATC 
ATrTTACGTT 



TGAAATCATC 
E Z H 

CCACTTOGAC 
A I* C 

TTCTTGCTTT 
V L A P 

CAGCATTATA 
SIX 

CAACTGGTCA 
O L V 

GOCTCCCTG6 
C L P G 

CAATAAAATG 
N K H 

G 'l'i Vl tl TCC C 

V V C 

AAGGTCTTAGG 
E G V G 

AGAATCTGAT 
E S D 

GATGGCTTTT 
D G F 

ACACTTTTCG 
H T F G 

TTTGGAGCCA 
L B P 

CTTTTCCAAA 

V F 0 

ACAAGATCCT 
N K X L 

ACCAGAACTT 
PEL 

TTCTCTCTTT 
r C V 

ATGGTGAACA 
OGEE 

CACTAACGAT 
T N D 

AAAAATGGAG 
K N G 

CAACGAACAA 
T T N H 

ACTGCTGCGA 
L L G 

CTAACACCCA 
L T P 

AGATTCrTTGC 
K X V G 

TGACTOGCTC 
OWL 



TCTCCTCAAA 

s p o- 

TTAAACTTCAA 
F K V M 

GGTTGTATAT 
V V Y 

ACTGAAAATA 
TEN 

FLAG 

TCTACATTAG 
L H * 

AAAACT TACC 
GAATAACTTT 
CTTAC TCTT C 
AAGTATGTCA 
AAAATTTTCA 
CCTCTT 



100 
30 

300 
S3 

300 
87 

400 
130 

500 
153 

600 
167 

700 
330 

BOO 
333 

900 
367 

1000 
330 

1100 
353 

1300 
387 

1300 
430 

1400 
453 

1500 
4 87 

1600 
530 

1700 
SS3 

1800 
587 

1900 
630 

2000 
653 

3100 
687 

3300 
730 

3300 
753 

3400 
787 

3500 
830 

2600 
853 

3700 
887 

3800 
930 

3900 
953 

3000 
987 

3100 
1019 

3300 
3300 
3400 
3500 
3600 
3696 



KD 

BO 

PO 



Bo: 

Po! 



Hoi 
Bol 
Pol 



Hol 



aoi 



Rek 
Bek 
Pok 



9ek 
Bok 
Pok 



Bok 
Bok 
Pek 



Bok 
Bak 
Pok 

Hok 
Bok 
Pek 



Hok 
Bok 
Pok 



FiCUF 

amtni 
uncor 
Cysie 

This ! 
correi 

over 



Figure 2: Nucleotide and translated amino acid sequence of human enterokinase. Numbering at the right indicates the nucleotide or amino 
acid residue at the end of each line. Amino acids arc shown in single-letter code. The tennination codon is shown by an asterisk (*). The 
sequences contained in individual cDNA clones areas follows: HEK27, ml— 2362; HEK 19a. nt 948-^2139; HEKlS. ot 1451— 2788; 
HEK12. nt 1591-3045; HEK6. nt 1762-2714; HEK3. nt 2278-2714; HEKl. nt 2454-3668;.HEK5,.nt 2511-3969. . . , 



Human Enterokinase 



Biochemistry, YoL 34, No, 14, J 995 4565 



s si-is s£ s s-si; -ss ss si ~ 



Bek 
Pok 



Hek 
Bek 
»ak 



Kok 
Bek 
Pek 



Bek 
Bek 
Pek 



NLKNEYKN5R VLOFElQZAiZ VvFDLfPaOW VSDl3»»vKSCL IQQlR AMKflS QLVtFHIDlH SvOH 

NUCHBYRNSR VL0FEM2fllI VlPDLlPdOW VSDkNvKBEL IQCiBAilKflS QLVtFHIDlN SiDItailiiii^ loMtta 
NLKNBVKHSR VLQFElfflflvI ViFOLlFaO- VSDeNlKEEL IOGIEaSS SuVaFHjSvN sJSJtesJSiSti S^ttC 



dXLTTtBhl^ TPC HVS lEgl 
eXLTTsiplA TPG WVfl iFaep 
dKLTTGSppA TPGHXaiECl 
1- 



PgSspCtDM* tCIkaDLPCO C&vNCPDCSO EDoKnCATvC OGrFulrgSS GSFoAthYPR nS etSWCa WITRvntkuc: 
MSrlCaOAI, kCIalDLFCD C81NCPDCSD BOnltt»T«C DGrFLU?SsS GSFSiSJS St^S^S^S? S"S^SoSJ 
PgSrpCaDAC kCIavPLFCP CBIKCPOGSD BOsKiCATaC OGkFlXTeSS CSPdAaqYPK IstcaSvMSq WIIRVNQcS 

■ ■ **— 



IkLsFddPNT yytOlLdlYB 
IqLnFdyFNT YyaDvLnlYE 
XeLnZoyPNT YGinDvLnirE 



CvGSSKtLRA SiWetNPGCI RIFSNOVTaT FLZeSDEaOY VGFnaTVTAP HBaELJJMt^RR -ttmavtmr-vt* ew.w^nt 

CmCSSKILRA SlHsnNPGil RIFSNOVTaT FI^IqSDEsDY ICFkS^rTAF ^^^S I^^ISSre JSTSS^S^ SS^SS^J^ ^"1?^''°" 
GVCSSKILRA SlWl.nKPGtI RIFSKQVTvT FLIeSDEnDY ICFaolYTAF ESSS llgigggg^ JSlgSliSS; ^JSc?? SpItc^SSU 

IS^S'^' STPTGPGGRq ERVCLLsLPL dPTlEpaCLS FWYhKYGENV hKCSIHIflnO OKniBKtVFXJX EGNYCdHWHY COVTLKEWk FKV«P««PK« 
V^^^^l STPTCPOGRr ERVCLLtLPL dPTpBQaCl^ FWyKYCENV yKLSIHIfloO 0KiaEKtIp5K bSJyCqI^ S^S^e fS^sJ^S™ 
TFCHoaCFYI STFTGPGGRq ERVCLLsLPL ePTlBpvCLS FWYyMYCEWV yKLSiaiflnp QHlBKilTOK SIyS2IISS? SlJJJ^e JSJa??^™ 



22 h ISLTYCICMB^IYPEPTLVP TpPPELPTDC GGPJBLWEPHJTFsStNFPN SYPMlAFCvW ILNAOKCKNI QLHPaEFDLB NlnDWEIRD 
f J 'i^Si^^ ISLTYGICastJvYPEPTLVP TpPPELPTDC GGPhDLWEPUUtXPtSiNFPN sYPNqAFCiW nUNAQIcSSl QLHfSeFDLE NiSovVEiS 
qtLSDIALDD ISLTYCICam YPEPTLVP TbPPBLPTDC CGPf BLWEPH_IXPtSnWFPN nYPN^AFCVW nLNASSSJ SLHFeErolli SfaSJ5J^™ 
*"~^^~ ^ ■ II 



Hok CCeaDSLlLA VYTGPGPVkD VFSTTNRMTV LllTndvUar oGFK AMPTT C YhLGZPEKK 
Bek GEqdDSLfLA VYTGPCPVnD VPSTTTIRMTV Ufl-WnmLak oGFK AWP^ TO YgLGIPCPCK 
Pok GBedDSLlLA VYTGPGPVeD VFSTTNRMTV LCITndaLtk oOFK AHPT TG YhLOIPBPCX 
' 5— 

Hek tnnnGLVrFR IQSIWHtACA EtQOTQiSnO VCQLLGLGsG MSS kPiPgrrt GGPfVkUrTA 

Bek tdseCLVqPR IQSIWHvaca EHHXTQlSdO VCQLCGIXStG KflavPtPStg GGPyVnLNTA 

POk aaBflGLVgFR IQSIWHtACA EtaQTOtSdO VCQLUSUStO MHSm PgPgBff GGPCVkLNTA 



aDhFOCknCB CvpLVNLCCC blHCeDGSDE AdCVRf f KQT 
eDUrOCkdGE CipLVNLCOG CpHCkDGSDB AhCVRlt MQT 
eOnFOCenGE CvlLVKLCDG faKCKOCSDE AhCVRf I MQT 

' 6 

PdGhLILTpS qQCLqOSLIr LQCKhKSCCK KlaaOdltPX 
PBOaLILTpS qQCLeDSLIl LQCNyKSCGK KlvtOeVsPX 
PBOaLZLTbS eOCFeOSLZl LQCNhKSCCK KqvaOeVaPK 



IVL l^lT^r^ ^^^lYZ S2 ^tf^ wf^SIJ'XS^ KCVYCRNZEP SICMCAlLGLH KkSfiKTSPQC vpRLIOelVI NPHYNrRRKd nOIAMHHLEf 
Pe^ ^V!^J:Ji *'*'?^^^!^ .^f'^Sl''''^ KCVYGRHb.EP SJWkAvLCLH MaSiatSPQi etRLIOqIVl NPHYNkRRKn nDIAKMHLEm 

pek ^IVGQodarEG AWPWWaLYy ngqllCGASL VSrOWLVSAA KCVYCRNlEP SKWkAiLCLH MtSHUCSPQi vtRLIDelVI NPHYNrRRKd SDIAMMHLEf 

♦ ♦♦♦ • 

Hek KVinCIDYIOP ICLPEENQVF PPGRflCfllAC WCCWYOGCt AniLQEADVP LLSNErCQOQ MPEYfcUIENM iCAGYEeGGl DSCOGDSGGP LMCaENNRWf 
Bek KVHECDYZOP ICLPEENOVF PPGRiCSIAG WGaaiYQGst AdvLQEADVP LI^NEkcSoO KPEySqW vCACYeS^V DSODGTOGGP uISSwRHl 
pek KVffiCCDVXQP ICLPEEKOVF PPGRiCSIAG WGkvlYQGsp AdlLQEADVP LLSNEkSoO KPEY^TOt ^G^ztoGi Sf^S^S^P IJJcX^h} 

» • 



100 
100 
100 



IBS 
200 
300 



284 
300 
299 



3B4 

«oo 

399 



484 
SOO 
499 



5B4 
600 
599 



684 
70O 
699 



784 
800 
799 

884 
900 
899 

984 

1000 
999 



Hek 
Bek 
Pek 



LACVTSPGYk CALPNRPGVY ARVarPTEWI QSFLH 1019 

LAGVTSrCYq CALPNRPGVY ARVprPTEWI QSFLH 1035 

LAGVTSPCYq CALPNRPGVY ARVpkFTEWI QSFLH 1034 
♦ » 



Figure 3 Alignment of human (Hek), bovine CBek) (Kitamoto et al.. 1994), and porcine (Pek) (Maisushima et ol.. 1994) enlerokinasc 
ammo acid sequences. Amino acids arc shown in single-letter code. Residues thai arc identical in all three species are in capital letters; 
unconscrved residues are in lower case. Numbering at the right refers to the translated amino acid sequence of each species of enterokinase 
Cysteine residues are in boldface type. Potential N-linked glycosylaiion sites are in boldface undcriined type. The potential signal anchor 
>cquence is double underlined. The location of a potential alternatively spliced cxon in bovine enterokinase is indicated by a doited underline 
J his segment is notably variable among ihfc aligned species. Sequence motifs in the heavy chain are indicated by numbered underlines that 
correspond to the domains shown in Figure I. 



over 599 nt. A similar degree of sequence identity is 
apparent when either the human or bovine enterokinase 
sequences are compared to the porcine enterokinase cDNA 
sequence (Matsushima et al., 1994). 

Structural Features of Human Enterokinase. Most struc- 
tural elements of human enterokinase arc highly conserved 
'^Figure 3). The similarities among the human, bovine, and 
porcine enterokinase sequences suggest that the mature 
proteins consist of two polypeptide chains derived by 
processing of a single-chain precursor. A potential myris- 
toylation site is present at Gly2 (Rudinick et aJ., 1993). 
Amino acid residues 19—43 are hydrophobic and may 
constitute a signal-anchor sequence. The putative heavy 
chain contains six sequence motifs that appear to be 
homologous to four types of domains found in other proteins 
(Figure 4). As reported previously (Kitamoto et al., 1994), 
the cleavage site after Lys784 separates the heavy and light 
chains of enterokinase. and the light chain is homologous to 
■• the trypsin-like family of serine proteases. In all three cloned 
enterokinases. the sequence surrounding this cleavage site 
is consistent with the known substrate specificity of trypsin. 
V:- Enterokinase domains 1 and 5 are homologous to cysteine- 
liich repeats in the low-density lipoprotein receptor (Sudhof 



et al.. 1985); domain 6 is homologous to a segment of the 
macrophage scavenger receptor (Freeman et al., 1990), as 
reported previously (Kitamoto el al.. 1994). 

During the analysis of the bovine enterokinase sequence 
(Kitamoto el al., 1994) domain 4 was recognized as a 
member of a sequence family that includes two motifs 
identified first in complement component Clr (Leytus et al.. 
1986). Domain 2 of porcine enterokinase then was found 
to belong to the same sequence family (Matsushima et al.. 
1994). As indicated in Figures 3 and 4, two Clr/s domains 
clearly are present in human, bovine, and porcine enteroki- 
nase. . 

Domain 3'of bovine enterokinase (Kitamoto et al.. 1994) 
was recognized as homologous to segments of the metallo- 
proteases meprin A (Jiang et al.. 1992) and mcprin B 
(Johnson & Hersh. 1992) and to a domain of the A5 protein 
of Xenopus /acv/j (Takagi et al„ 1991). The name "meprin 
domain" was suggested for this motif (Kitamoto et al.. 1994). 
However, a previous report had described the Same motif in 
. meprins, the Xenopus A5 protein, and in the extracellular 
domain of receptor protein tyrosine phosphatase (Gebbink 
et al.,"1991); the name **MAM" domain was proposed (for 
"meprin". "A5*V and "mu") (Beckmann & Bork. 1993). The 




4566 Biochemistry, VoL 34, No. 14, 1995 



Kitamoto et al. 



Hun 



Hek-3 

aofc-4 

ltolloid-3 
ToIlold-3 
TOllold-4 
Clr-1 
Clr-3 

CODBCUXfiUS 

Bek-a 
Bek-4 
Tolloid-3 
Tollold-3 
Tollold-4 
Cir-l 
Clr-a 
Consenaus 



CDGKFI.LTG5 
CGGPPELWEP 
CGGDLKLTKO 
CGGWDATKS 
CKFBI . .TTS 
. .SIPII>QKL 
CSSBLY.TSA 
COGB — LTKS 

91 



SGSFQATHYP 
NTTFSSTNFP 
OSI.DSPHYP 
NGSLYSPSYP 
YCVbQSPNYP 
FGBVTSPUPP 
SGYISSLGYP 
-C SPNYP 



KPSCTSWCQ 
NSYPNLAFCV 
HDYMPOKECV 
OVYPKSKQCV 
EDYPRNZYCY 
KPYPKNFETT 
RSYPPDLRCH 
— YPN CV 



IRI PSNCJVTATFL 
. .DV FSTTKRMTVL 

\ i .NI iCTBSNQMYlB 

W NSEQSILRL.E 

AV lASTKEMFKV 

liGNPPGKKEP MSQGNKML.l'T 

DL DTSSNAVDLL 

DV -S — N-M-L.L 



lESDES.DYV 
LXTNEfVLARQ 
FV5DSSVQKL 
FYSDRTVQRS 
LATDAGU3RX 
FHTDFSNEEN 
FPTDBSGDSR 
F-TDES--R- 



WIXRVHQGL£ 
WIUOAOKGKN 
WRITAPOMHQ 
WEWAPPtmA 
WHFQTVLGHR 
TVITVPTGYR 
YSIRVERGLT 
wn — P-GH- 

130 

GFNATYTAFN 
GFKANFTTCY 
GFSAALMT.n. 
GFVAKFVID. 
CFKATFVSE . 
GTIMF YKGF L 
GWXLRYTTBI 
GFXA — T 



IKLSFDD FWT 

IQLHF.OEFD LEN 

VAUCF.OSFB LEK HDG 

VFU4F-SHFD LEGTRFHYTK 

IQLTP.HDFE VES HQE 

VKLVP.QQFO LBPS EG 

UILKF^EPFD IDDKQOVH.. 
— L-F-O-FO LB H — 



YYTDILDIYE 
.XKDWBIRO 
CAYDFVBIRO 
CNYOYLIIYS 
CIYDYVAIYD 
CFYDYVKISA 
CPYDQIAIYA 
C-YOYVEIYD 



CVGSSRILRA 
GEBADS.LIX 
GMHSOS.RLI 
KMRDNRUOCX 
GRSENS.STti 

DKKS L 

KGICN I 

G--S-S 



90 

SXWETNPGT. 

AVYTGPCPVK 

GRTCGDKLPP 

GIYCGHCLPP 

GIYCGGREPY 

GRFCGQbGSP 

GBFCGKQRPP 

Gl-CG PP 



U 



ft) 

o 



o 

Of 



CI o o o rz o 51.— 
^ "5 E > S »- _c a. 



9.5 kb- 
7.5 - 
4.4 - 

2.4 - 
1.35 - 





1 and 3. Domains Hek-2 and Hek-4 are aligned ^J^^^^^^ ^^^^^ WsS'rte s^nificance of gap alignments was evaluated by 
(ShimeU et al.. 1991) and fton. ^^P^'"" ~""P°"';"' obLun^ for 3oTgnments of SSdomized sequences, using the 

recently cloned receptor protein tyrosine phosphatase k also 
contains a MAM domain (Jiang el al., 1993). 

The function of the enterokinase heavy-chain domains is 
unknown. Related domains in other proteins appear to bind 
ligands or mediate protein-protein interactions. For ex- 
ample, the a-subunit of mouse meprin A associates with the 
^-subunit, possibly through MAM domains in each subunit. 
This association is required for membrane local izaUon of 
the mature a-subunit, which lacks a membrane-spanning 
domain (Marchant et al., 1994). Thus, the MAM domain 
of enterokinase could interact with other proteins that 
contribute to membrane localization or enzyme activity. A 
role for the heavy chain in determining substrate specificity 
would be consistent with the reported ability of heating 
(Bams & Elmslie. 1974; Anderson et al.. 1977), acetylation 
of amino groups (Baratti & Maroux. 1976). or dissociation 
of the light chain by partial reduction (Light & Fonseca, 
1984) to selectively impair enterokinase activity toward 
trypsinogen without markedly affecting activity toward small 
amides or esters. 

A few segments of the enterokinase heavy chain show a 
notable lack of sequence conservation. A potential alterna- 
tively spliced sequence of 81 nt was idenUfied in several 
bovine enterokinase cDNA clones (Kitamoto et al., 1994) 
and was present in porcine enterokinase (Matsushima et al., 
1994). This segment overlaps with a 45 nt deletion in human 
enterokinase that shortens the heavy chain by 15 amino acids 
and deletes one potential N-Unked glycosylaUon site (Figure 
3). suggesting that this region may tolerate some variation 
in length. This variable segment is rich in hydroxyamino 
acids, especiaUy in porcine enterokinase for which 13 of these 
27 amino acids are serine or threonine (Matsushima et al.. 
1994). Because of its striking amino acid composition, this 
segment was suggested as a possible site of O-linked 
glycosylation (Matsushima et al., 1994). although direct 
evidence for this modification has not been reported. In 
human fenterokinase, this segment contains only four hy- 
droxyamino acids (Figure' 3). 

Human and porcine enterokinase also lack one amino acid 
residue thai is found in bovine enterokinase domain 2 (Figure 
3); this deletion removes two possible concatenated N-linked 
glycosylation sites. Several additional glycosylation sites are 
not conserved, so that human, bovine, and porcine enter- 
okinase "heavy* chains have 14," "17, and 18 potential *N- 
lly cosy latibh sites, rsspectively ." 



p-Aciin- 



FiGURE 5: Expression of enterokinase in human tissues. A Norxhem 
blQi of human poly(A)+ RNA from assorted human tissues (2 //g/ 
lane) was hybridized with radiolabeled cDNA probes as descnbcd 
under Experimental Procedures. The upper panclshows hybndua- 
tion with an enterokinase cDNA probe derived f^m clone HE Ki. 
exposed to X-iay film for 10 days. The lower p«ncl sbo>^ f^c ^e 
bloTafter being stripped and rehybridized with human /'-acun cDNA 
probe. ocposSi for 2 h. The mobility of RNA size standards is 
indicated at the left. 

Tissue Distribution of Enterokinase mRNA. By Northern 
blotting of human poly(A)4- RNA. an =^.4 kb mRNA for 
enterokinase was detected in small intestine. No expression 
was observed in leukocytes, colon, ovary, testis, prostate, 
thymus, spleen, pancreas, kidney, skeletal muscle, Uvcr, l«n&-f 
placenta, brain, or heart (Figure 5). A band of similar siz^ . | 
was detected by Northern blotting of RNA .'.1 
duodenum with a bovine enterokinase cDNA probe Cof^ j^ 
not shown). These results are consistent witii ^^^^jf^^^^^.^l^vIM 
of enterokinase activity (Pavlov, 1902; Lojda & Malis. 
and antigen (Miyoshi et al., 1990) almost exclusively 
enterocytes of proximal small intestine. -V^ 
Chromosome Localization of the Human Enterokinase^ 
Gene, Huorescent in situ hybridization was used to ph3^ 
cally localize the human enterokinase gene. To 
adequate hybridization signal, the inserts of cDNA clqp^ 
HEKl and HBK6 were mixed, thereby including «l-f;5S 
of the cPNA sequence. The DNA was labeled with bi?^ 




t al. 



K 
P 
P 
V 
P 
P 
P 



$ures 
Otcin. 
d by 
g the 



i 



i.-r, .... 




f 



Human Enterokinase 



Biochemistry. Vol 34. No. 14, 2995 4567 



inhem 
(2/ig/ 
4;ribed 
ridiza- 
HEKl. 
e same 
cDNA 
ards.is 



for if; j 
essioD...ri.- 
estate, 

ar size 
bovine i;^ 
5 (data;^' 




FiGLUE 6: Fluorescent in situ hybridization localization of the enterokinase gene to human chromosome 2]q21. Five metaphase spreads are 
shown. Arrows indicate biotin-labeled probe hybridization (color) and the position of the same spreads banded using Gtemsa dye. Also 
Lq. shown is an idiogram of chromosome 21 with band q21. to which the probes hybridize, indicated by an arrowhead. 



-and hybridized to prometaphase spreads of human chromo- 
somes. Labeled DNA was detected with fluorescein isothio- 
• cyanate-conjugated avidin DCS and amplified with fluores- 
^ cein isothiocyanate-conjugated goat anti-avidin D antibodies, 
independent metaphase spreads were analyzed, and five 
^representative spreads are shown (Figure 6). Specific 
^>hybridization of the enterokinase cDNA probe was observed 
g.Po chromosome 21; no consistent secondary hybridization 
gvas detected, 4,6~Dianudino-2-phenylindole dihydrochloride 
ig and Giemsa banding confirmed the location of the 
lybridization signals on chromosome 21 band q21. 
|:The human enterokinase locus appears to be close to the 
i^^® for >9-amyloid precursor protein at 21q21.2 (Nizetic et 
1994), which is mutated in one form of inherited 



Alzheimer disease (Goate et al., 1991), and to the gene for 
superoxide dismutase at 21q22.1. which is mutated in familial 
amylotrophic lateral sclerosis (Rosen et al.» 1993). Enter- 
okinase also is in or near a region implicated in specific 
features of Down syndrome, although the precise locations 
of chix)mosome 21 segments that contribute to Down 
syndrome remain unknown (Korenberg et al.. 1994). The 
cloning of cDNA for human enterokinase will enable fine 
structure physical and genetic mapping of these loci and the 
characterization of mutations in congenital enterokinase 
deficiency (Hadom et 'al., 1969; Haworth et al.. 1971). These 
clones also facilitate the study of biosynthetic targeting to 
apical brush border meinbranes/ zymogen activation, and 
substrate specificity of human "enterokinase. 



4568 Biochemistry. Vol 34, No. 14, J995 
ACKNOWLEDGMENT 

We thank: Lisa Wescfield for synthesis of oligonucleotides 
and Drs. Xin Yuan and Qingyu Wu for many helpful 
discussions. 

REFERENCES 

Anderson, L. E., Walsh, K. A.. & Neurath, H. (1977) Biochemistry 
J6. 3354-3360. 

Baratti, J., & Maroux, S. (1976) Biochim, Biophys. Acta 452, 488- 
496. 

Baratti » J., Maroux. S., Louvard. D., & Desnuelle. P, (1973) 

Biochim, Biophys. Acta 3 J 5, 147-161. 
Bams. R. J.. & ElmsHe. R. G. (1974) Biochim. Biophys. Acta 350, 

495-498. . 

Bcckroann, G.. & Bork. P, (1993) Trends Biochem. Set, 18, 40- 
41. 

Dangott, L. J.. Jordan. J. E:. Bellet, R. A.. &. Gaibers. D. L. (1989) 

Proc. Natl, Acad, Set. U.S,A. 86, 2128-2132. 
Davie. E. W., & Neuraih. H. (1955) J. Biol. Chem. 212, 515-529. 
Feinberg. A. P., & Vogelstein. B. (1983) An^/. Biochem. 132, 6—13. 
Fonseca, P.. & Light, P. (1983)7. Biol. Chem, 258, 14516-14520. 
Freeman. M.. Ashenas, J., Rccs. D. J. G.. Kingsley, D. M., 

Copeland. N. G., Jenkins. N. A.. & Krieeer. M. (1990) Proc. 

Natl. Acad. ScL U.SJi. 87, 8810-8814.' 
Gebbink. M. F. B. G.. van Eitca, I., Hateboer, G.. Suijkcrbuijk, R.. 

Beijcrsbergen. R. L., Geurts van Kessel. A., & Moolenaar. W. 

H. (I99I) FEB5 Lett. 290, 123-130. 
Gish. W.. & States. D. J. (1993) Nature Genet. 3, 266-272. 
Goate. A.. Chartier-Harlin, M.-C., MuUan. M., Brown, J., Crawford. 

F, , Fidani. L., Giuf&a. L.. Haynes, A., Irving. N., James. L., 
Mant, R., Newton. P.. Rooke. K.. Roques, P., Talbot, C, Pericak- 
Vancc, M., Roses, A.. Williamson. R.. Rossor. M., Owen, M., 
& Hardy. J. (1991) Nature 349, 704-706. 

Gunning, P,. Ponte, P.. Okayama, H., Ehgel. J., Blau. H,, & Kedes, 

L. (1983) Mol. Cell. Biol, 3, 787—795. 
Hadom. B., Tarlow, M. J.. Lloyd. J. K.. & Wolff, O. H. (1969) 

Lancet (. 812—813. 
Haworth, J. C„ Gourlcy, B., Hadom. B.. & Sumida, C. (1971) J. 

Pediatr, 78, 481-490. 
Jiang. W.. Gorbca; C. M.. Flannery. A. V., Beyoon, R. J., Grant, 

G. A.. & Bond. J. S. (1992) J. BioL Chem. 267, 9185-9193. 
Jiang, Y.-P., Wang, H.. D*Eustachio, P., Musacchio. J. M., 

Schlessinger. J.. & Sap. J, (1993) Mol. Cell, BioL 13, 2942- 
2951. 

Johnson. G. D., & Hersh. L. B. (1992) J. BioL Chem. 267, 13505— 
13512. 

Kitamoto. Y.. Yuan. X.. Wu. Q.. McCourt, D. W.. «& Sadler. J. E. 
(1994) Proc, NatL Acad. Sci. U.SJL 9J, 7588-7592. 

Korenberg. J. R., Chen, X.-N.. Schipper. R., Sun, Z.. Gonsky. R., 
Gerwehr. S., Carpenter. N.. Daumcr, C, Dignan. P., Disteche, 
C. Graham. J. M.. Jr.. Hudgins, L.. McGUlivray. B., Miyazaki, 
K.. Ogasawara, N., Park. J. P., Pagon, R„ Pueschel, S.. Sack, 
G., Say. B., Schuffenhauer. S., Soukup. S., & Yamanaka, T, 
(1994) Proc, NatL Acad. ScL U.S.A, 91, 4997-5001. 

Kunitz, M. (1939) J. Gen, PhysioL 22, 429-446. 



Kitamoto et al. 

LaVaUie. E. R.. RebcmtuUa, A., Racie, L," A.. DiBlasio, E. A 

Ferenz. C. Grant, K. L., Light, A.. & McCoy. J. M. (1993) / 

BioL Chem. 268, 23311-23317. 
Leytus, S. P., Kurachi, K., Sakariassen. K. S., & Davie. E. W 

(1986) B/ocArniijrry 25, 4855-4863. 
Uchter, P., Cremer. T.. Borden. J.; Manuelidis. L., & Ward, D. C 

(1988) Hum. Genet. 80, 224 -234. 
Ucpnieks. J. J.. & Ught, A. (1979) /. BioL Chem. 254, 1677- 

1683. 

Ught, A., & Fonseca, P. (1984)7. BioL Chem. 259, 13195-13198 

Lojda, Z., & Malis. F. (1972) Histochemie 32, 23-29. 

Magcc, A- I., Grant. D. A. W., & Hermon-Taylor. J. (1981) Clin. 

Chim. Acta US, 241-254. 
Marchant. P.. Tang, J.. & Bond. J. S. (1994) J. BioL Chem. 769 

15388-15393- 

Matsushima, M.. Ichinose. M.. Yahagi, N., Kakei. N., Tsukada, 
S.. Miki, K.. Kurokawa, K., Tashiro. K., Shiokawa, K., Shi- 
nomiya. K.. Umcyama, H,. Inoue. H.. Takahashi, T., & Taka- 
hashi. K. (1994) /. BioL Chem. 269, 19976-19982. 

• Miyoshi. Y.. Onishi, T., Sono, T.. & Komi. N. (1990) Gastroenterol 
Jpn. 25, 320-327. 
Naude. R. J., Da Silva. D., Edge, W., & Oelofscn, W. ( 1993) Comp 

Biochenu PhysioL I05B, 591-595, 
Nisetic. D.. Gellcn, L., Homvas, R. M. J.. Motl, R.. Grigoriev, A. 
Vaichcva. R.. 2:ehetner. G-. Yaspo. M.-L., Duu-iaux. A.. L -pes, 
C, Dclabar, J.-M.. Van Broeckhoven. C, Potier. M. : . & 
Lehrach. H. (1994) Hunt. MoL Genet. 3, 759—770- 
Pavlov. I. P. (1902) T7te Work of the Digestive Glands, 1st ed.. 

Charles Griffin & Co.. London. 
Rigby, P. W. J., Dicckmann, M.. Rhodes, C. & Berg. P. (1977) / 
MoL BioL J 13, 237-251. 

Rosen. D. R., Siddique. T.. Patterson. D.. Figlewtcz, D. A., Sapp, 
P., Hentati. A-. Donaldson. D.. Goto. J., O'Regan. J. P... Deng. 
H.-X., Rahmani, Z., Krizus, A. McKenna-Yasek, D., Cayabyab. 
A., Gaston. S. M., Berger. R., Tanzi. R: E., Halpenn. J. J.. 
Herzfeldt, B.. Van den Bcrgh. R., Hung. W.-Y„ Bird, T.. Deng, 
G-, Mulder, D. W., Smyth. C;. Laing. N. G.. Soriano, E.. P: .^cak- 
Vancc, M. A.. Haines. J.. Rouleau. C. A.. Gusella. J. S.. Horvilz, 
H. R„ & Brown, R. H., Jr. (1993) Nature 362, 59—62. 

Rudnick, D, A., McWhertcr. C. A., Gokcl. G. W., & Gordon. J. I. 
(1993) Adv. BnzymoL 67, 375-430. 

Sambrook, J., Fritsch, E. F., & Maniatjs. T. (1989) . in Molecular 
. Cloning: A Lal>oratory ManuaL 2nd cd., pp 387—389, Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor. NY. 

Sanger. P., Nicklen, S., & Coulson. A. R- (1977) Proc. NatL Acad. 
ScL U.SJL 74, 5463-5467. 

Shimell, M. J., Ferguson, E. L., Childs. S. R., & O'Connor. M. B. 
(1991) Cell 67, 469-481. 

Sudhof, T. C, Goldstein. J. L., Brown, M. S., & Russell. D. W. 
(1985) Science 228, 815-822. 

Takagi, S.. Himta, T., Agata. K.. Mochii. M., Eguchi. G.. & 
Fujisawa, H. (1991) Neuron 7, 295-307. 

Yamashina, I. (1956) Acta Chem. Scand, JO, 739—743. 

Yunis. J. J. (1976) Science 291, 1268-1270. 

BI9S001II 





Exhibit 1 9 



Biochemistry 1988, 27, 1067-1074 



1067 



A Novel Trypsin-like Serine Protease (Hepsin) with a Putative Transmembrane 

Domain Expressed by Human Liver and Hepatoma Cells* 

n . f^yj""'* Keith R. Loeb." Frederick S. Hagen.l Kotoku KurachiM and Earl W..Davie» » 

Departmem of B,ochemU,ry. University of Washington. Seattle. Washington 98195. ana iymoGenetics. ,n^. 21 2, North 35,h 

Street. Seattle. Washington 9SI03 
Receiuetl August 24. 1987; Revised Manuscript Reeeioeil October 16. 1987 



*^Ti"."u"^ «°'*«8 for a new serine protease (hepsin) have been 

S?^e oOmT 5?^"^ • P[«i«Jf«l f™"" "ver and hepatoma ceU liie mRNA. ^e totaUeSh 

of the cDNA IS approximately 1 .8 kilobases and includes a 5' untranslated region. 1251 nucleotide c^Se 
bv thTcDNA f^r" " 3' untranslated region, and a poly(A) taH. The am"no acW seq^n^ 

c«led by the cDNA for hepsin shoM« a high degree of identity to pancreatic trypsin and other serine piote^a 
pr«ent in plasnria. It also exhibits features characterisUc of zym^ens to seriV^ proteases in ttSut wn^ 
a cleavage site for protease activation and the highly conserved rigions surrounding^His aId Tnd 
IfJdr^n,?hl' In addition, hepsin lacll a typical aminc^t«^L s£^al p^Jude 

s^aS 8^«fH?!'.f f P™t«'n/«l«ence. however, revealed a very hydrophobic region of 27 Vmin^K 
starting 18 residues downstream from the apparent initiator Met. This region may serve as an intemTl 
signal sequence and a transmembrane domain. This putative transmembrane domain could b^ involved 
m anchoring hepsin to the eel membrane and orienting it in such a manner that its cXxyl terminus 
containing the catalytic domain, is extracellular. ».»ri«i*yi icrminus. 



M. 



any biological processes which require specific, limited 
proteolysis arc mediated by a member(s) of the serine protease 
family of proteolytic enzymes. These proteases exist as single- 
or two-chain zymogens that arc activated by specific and 
limited proteolytic cleavage (Ncurath & Walsh. 1976). They 
contain three principal active-site amino acids (His, Asp, and 
Ser) that participate in peptide bond hydrolysis (Blow ct al.. 
1969). In addition, they share considerable structural simi- 
larities in their caulytic chains. 

Among the best-studied serine proteases are those that are 
found in plasma. These enzymes arc involved in prxx»sscs such 
as blood coagulation (Davie et al.. 1979). fibrinolysis 
(Christman et al., 1977; Collen. 1980), and complement ac- 
tivation (Reid & Porter, 1981). The active form of most 
plasma serine proteases consists of two polypeptide chains held 
together by a disulfide bond(s). a highly conserved catalytic 
chain derived from the ca r boxy 1- terminal end of the precursor 
polypeptide, and a unique noncatalytic chain derived from the 
amijio-terminal portion of the polypeptide chain. The presence 
of a noncatalytic chain(s) distinguishes the plasma serine 
proteases from the digestive proteases of the pancreas. By 
mediating interactions with other proteins or surfaces, non- 
catalytic chains induencc the action of plasma serine proteases 
on their selected substrates. The biosynthesis of most.of the 
serine proteases present in plasma occure in the liver. Although 
at least 20 different serine proteases synthesized in the Jiver 
have been described thus far, it is quite likely that many more 
exist. 

Recent reports have identified a number of new serine 
proteases produced in different tissues and cell types. Cook 

^This work was supported in part by research grants (HL 1 69 1 9 and 
HL 31 51 1) and a postdoctoral fellowship (GM 09J 18 to S.P.L.) from 
the Nauonal Institutes of Health. 
'University of Washington. 

•Present address: Department of Biochemistry. Medical College of 
W^tsconsin. 8701 Watertown Plank Road, Milwaukee WI 53226 
• ZymoGenetics. Inc. 

^ Present address: Department of Human Genetics. 4708 Medical 
^109* IJn'vcmty of Michigan Medical School. Ann Arbor, MI 



et al. (1985, 1987) have described a cDNA coding for a new 
serine protease that is expressed during adipocyte differenti- 
ation. Gershenfeld and Wcissman (1986) and Lobe ct al. 
(1986) have cloned cDNAs coding for new serine proteases 
exprcjssed by cytotoxic T lymphocytes. Newly characterized 
proteins have aUo been isolated from cytotoxic T lymphocytes 
(Pasternack et al., 1986; Young et al., 1986; Masson & 
Tschopp, 1987), Uver (Tanaka et al., 1986). ovary (Eisenhauer 
& McDonald, 1986). pituitary gland (Cromlish ct al., 1986), 
embryo fibroblast cells (Billings et al.. 1987). seminal plasma 
(Watt ct al.. 1986). submaxillary gland (Lundgren et al., 
1984). and tumor cells (LaBombardi et al., 1983) that exhibit 
properties typical of serine proteases. Additional new proteases 
have been reported, but not all have been identified as be- 
longing to the serine protease family. Although the majority 
of serine proteases are synthesized with signal peptides that 
direct their secretion outside of the cell, some of the new serine 
proteases recently reported may be associated with cell mem- 
branes (LaBombardi et al.. 1983; Tanaka et al., 1986). 

As a general approach to isolaUng cDNAs coding for serine 
proteases synthesized in the liver, a strategy was chosen that 
involved screening a human liver cDNA library with a syn- 
thetic oligodeoxyhucleotide probe coding for a highly conserved 
amino acid sequence known to exist in a number of different 
serine proteases. In this manner, recombinant clones were 
isolated that contained cDNA inscrU coding for serine pro- 
teases synthesized in the liver, including human factor IX 
(Kurachi & Davie, 1982). prothrombin (Degen ct al., 1983) 
and complement Clr (Lcytus ct al., 1986a). In this paper,' 
we report the isolation and characterization of the cDNA 
coding for a new trypsin-like serine protease. This hepato- 
cyte-expressed protease has been called hepsin. 

Experimental Procedures 

DNA restriction endonucleases and DNA modification 
enzymes were purchased from Bethesda Research Laboratories 
or New England Biolabs. '^-Labeled nucleotides used in 
njck-translating cDNA fragments (Maniatis et al.. 1982) and 
S^-end-labeltng synthetic oligodeoxynuclcotides (Maxam & 



OOO6-2960/88/O427-1067$0l.50/O © 1988 American Chcmicirs<>ciety 



1068 BIOCHEMISTRY 

Gilbert, 1980) were obtained from New England Nuclear. 
[a-^%]dATP and nonradioactive nucleotides used for DNA 
sequencing were products of Amersham and Pharmacia, re- 
spectively. A mixture of tetradecadeoxynucleotides (used to 
screen the plasmid cDNA library) was synthesized by P-L 
BiocbemicaU and contained the following sequence: 

G 

I 

S'C-C-A-G-C-G-C-A-G-A-A-C-A-T3' 



LEYTUS ET AL. JfcDNA 



I 



A cDNA library prepared from human liver mRNA was 
kindly provided by Drs. S. L. Woo and T. Chandra of the 
Baylor College of Medicine. The library contained cDNA 
inserted into the Psfl site of plasmid pBR322 (Chandra et a].. 
1983). In addition, a cDNA library prepared from human 
hepatoma cell line (Hep G2) mRNA was also used* This 
library contained cDNA inserted into the j^coRI site of bac- 
teriophage vector Xgtl I (Hagen el al., 1986). The plasmid 
library was prepared for colony hybridization (Gcrgen ct ah, 
1979) and the Xgtl 1 library for plaque hybridization (Benton 
& Davis, 1 977) according to established procedures. Hy- - 
bridization conditions using '^P-labeted synthetic oligo- 
deoxynudcotide and cDNA probes were the same as described 
previously (Leytus et al., 1986a). 

DNA from recombinant phage was prepared according to 
Maniatis et a 1. (1982) with minor modificiations (Leytus et 
al., 1986a). cDNA inserts were released from the recombinant 
phage DNA by digestion with ScoRl, and a selected number 
of these were then subcloned into the EcoKl site of a pUC 
plasmid vector (Vieira & Messing, 1982). Plasmid DNA was 
prepared by a modification of the alkaline extraction procedure 
of Birnboim and Doly (1979). essentially as described by 
Micard ct al. (1985). 

Selected fragments from restriction enzyme digests of re- 
combinant plasmids were subcloned into M13 bacteriophage 
vectors by the method of Messing (1983). These were then 
sequenced by the dideoxy chain terminator method of Sanger 
et al. (1977), employing the modifications described by Biggin 
et al, (1983). DNA sequences were analyzed by the computer 
program genepro (Version 4.0, Riverside Scientific Enter- 
prises, Seattle, WA). Protein sequences were also analyzed 
by using genepro and the computer programs search 
(Dayhoff. 1979) and align (Dayhoff, 1983). 

Results 

A plasmid cDNA library prepared from human liver 
mRNA and containing approximately 14000 recombinant 
colonies was screened with a mixture of synthetic tetradeca- 
deoxynucleotide sequences (Leytus et al., 1986a). These se- 
quences were complementary to the mRNA sequence coding 
for the amino acids Met-Phe-Cys-Ala-Gly. The sequence 
Met-X-Cys-Ala-Gly is highly conserved in many serine pro- 
teases and is found approximately IS amino Acids' prior to the 
active-site serine. Among the 3 1 strongly hybridizing clones 
that were initially identified. 14 contained cDNA inserts coding 
for prothrombin, 9 for Clr, 2 for factor IX, and 5 for an 
unidentified protein whose cDNA contained a single nucleotide 
mismatch with the hybridization probe (Leytus ct al.,' 1986a). 
The last clone (designated HUW1250) coded for a serine 
protease and has now been examined more extensively. 

By Southern transfer and hybridization analysis,. the site 
in HUWI250 responsible for hybridizins to the . synthetic 
ollgodeoxynucleotide probe was localized, and the nucleotide 



,JL-LC 



1 



zoo 400 600 



■I TfT (C( iW^ 



BOO lOOO I200 1400 IGOO 




132 




I9tt92 



nouRje 1: Restriction endonuclease map of the cDNA coding for 
human bepsin. The schematic representation of several of the cDNA 
inserts and a sununary of the straUgy used to sequence portions of 
these inserts are shown. The solid, open, and slashed regions represent 
5' untranslated^ coding, and 3' untianslated regions, respectively, within 
a cDNA insert. The stipled regions represent impropcrty spliced 
intronic sequence found in clones HepG2UWl7 and HepG2UW2. 
Arrows indicate the direction and extent of sequencing obtained from 
the M 13 subclones. The numbers at the 5' end of each insert refer 
to positions within the nucleotide sequence of the cDNA (sec Figure 
2). Sequencing strategy for the apparent intron fragments is not shovrn. 

sequence of this region was determined. A DNA sequence 
was found that matched perfectly with one of the sequences 
in the oligodeoxynucleotidc mixture used as a probe. Closely 
following the DNA sequence that coded for Met-Phe-Cys- 
Ala-Gly and in the same reading frame was an amino acid 
sequence of Gly-Asp-Ser-GIy-Gly-Pro. The latter amino acid 
sequence represents the most highly conserved region in serine 
proteases and contains the active-site Ser residue. Since the 
deduced amino acid sequence flanking this highly conserved 
region did not match with any known serine protease, it ap- 
peared that HUW1250 coded for a new serine protease. This 
new enzyme has be^n called hepsin. 

Following the sequencing strategy shown in Figure 1, the 
complete nucleotide sequence of HUW1250 was determined 
[nucleotides 585-1 783 (Figure 2)]. A number of other amino 
acid sequences that arc highly conserved in most serine pro- 
teases were also present in hepsin. These included an Arg; 
Ile-Val-GIy-Gly activation site region (residues 162-166), a 
Thr-Ala-Ala-His-Cys active-site His region (residues 
20O-204), an Asp-Ile-Ala-X^u-Val activ&^site Asp region 
(residues 257-261), and also the Met- Phe-Cys- Ala -Gly oli- 
godeoxynucleotidc probe site (residues 336-340) and the 
Gly-Asp-Ser-Gly-Gly-Pro active-site Ser region • (residues. 
351-356), Furthermore, the relative positions of all of these 
conserved regions in hepsin were the same as they occur in 
other serine proteases. Although HUW1250 contained a 
poly(A) tail, it was apparent that it did not represent a full- 
length cDNA since the nucleotide sequence 5' to the sequence 
coding for the Ai^g-Ilo-Val-GIy-Gly activation site did hot qadc 
for a Met residue that could serve as a. site for initiation of 
translation. . V - . 

In order to isolate clones with larger cDNA inserts, ;ap;;i 
:proximate]y 960000 recombinants from a Hep G2 ccU hnC;; 
cDNA library (constructed in bacteriophage Agtl l).0*%P. 



8 



127 
1 

246 

31 
336 

61 
4Z6 

91 
316 

121 
606 

131 

696 

161 
766 

211 
676 

2*1 

866 C 

271 
1036 C 

301 
1146 T 

331 
1236 C 

361 
1326 G 

391 
1416 Al 



1507 C 
1626 C 

1743 K 

FIOURE 2: } 
in Figure 1 
inserted ^pq 
acid scqucn 
inactive zyn 
is the site r 

screened fc 
as a hybrii 
Were identi 
plaque pui 
these clont 
Digestio 
released in 
to 1800 ba 
and HepG: 
bp EcdRl- 
HcpG2U\^ 
the origina 
additional 
HcpG2U V 
selected fo: 
map for th< 
library is si 
the cDNA 
/described ii 
. The com] 
hepsin is sh 



CDNA CODING. FOR HUMAN HEPSIN 



VOL. 27, NO. 3, 1988 1069 

TOSAGCC 



8 CCCTTTOCACGGACOCTA0CTGMX»3OCACAGGTCACOCACCCTGGCCTACCA^ 



IAOCATCCTOCTOOCCA 



127 
1 

24S 

31 
336 

61 
426 

91 
. 916 

121 

606 

131 
696 

181 
766 

211 
676 

241 

066 

271 
1036 

301 
1146 

331 
1236 

361 
1326 

391 
1416 



CGCCTGCAOACTGAcdcCACCCagCACTAOCTCGA OGC ICLKiC COC CA OCT GC T GO AOOCCAt^gTOOCA COCK^X OC^^ 



HAQKEGCB 
ATG CCC CAG AAC CAS OCT GCC OGG' 



X T P C C 8 
ACT GTC OCA TOC TGC TCC 



R 
ASA 



K 

AAO 



-4 A J. T A G T 

GCA OCT CTC ACT GOC OCXS ACC 



CTG CTA CTT CTO ACA 



Gob c& cioti b&k Vdd'Yoc^ aI? 



y ^ ^ ^ Rsdqeplypvovssao 

CTO OCT GTT CTC CTC ACC ACT GAC CAG GAG OCO CTG TAG OCA* CTG CAG GTC ACC TCT GCO CAC 



aRlmvfoktectwrllcesrsharvaglsc 
gct cgc ctc atg gtc ttt gac aag aog gaa gog aog toc ccg ctg ctg tgc tcc too coc tcc aac ccc aog cta ccc cca -ctc acc tcc 

eemgplralths eldvrtaga mgtsc f f cv 
gac gac atg ogc ttc ctc ago gca ctg aoc cac toc gac ctg gac ctg cga aog gog goc occ aat cgc aog icg cgc ttc ttc tgt gtc 



C 
TGC 



oegrlphtorllevisvcdcprorflaai 

GAC gag GGO ago CTG CCC CAC ACC CAG AGG CTO CTO GAG CTC ATC TOC CTO TCT GAT TOC CCC AGA GCC GOT TTC TTG GCC OCC ATC 

QDC GRR 'XLPVD R^ I VGGR D.T5 LGRWpHOV S L 
CAA GAC TGT OGC COC AGG AAG CTG OCC CJG GAC CCC ATC GTC GCA GCC CCG GAC AOC AGC TTG COC OCG TOG CCG TOG CAA OTC AGC CTT 



R .y DC A BL COGS L LSGDWVLTAA 
CCC TAT CAT CGA CCA CAC CTC TGT CCC GCA TOC CTO CTC TOC GCC GAC TCC CTG CTG ACA CCC CCC 



C F P E R N R 
TOC TTC CCG GAG COG AAC CCG 



VL8RWRVrAOAVAQASP8GL01.0VO'AVVYB 
GTC CTG TCC CGA TOO CCA CTG TTT GOC OCT CCC CTO CCC CAC GCC TCT OCC CAC OCT CTG CAO CTG OCG GTC CAC GOT CTG CTC TAC CAC 



OGYLPPROPNSEEN8N 
GOG CGC TAT CTT CCC TTT CGG CAC OOC AAC ACC CAC CAG AAC ACC AAC 



TALVBLSSPLPLT 
ATT GOC CTG CTC CAC CTC TCC AGI CCC CTG CCC CTC ACA 



lOFVCLPAAGQALVDGKICTVTCWGNTO 
GAA TAC ATC CAG OCT CTG TGC CTC OCA GCT GCC OGC CAG GCC CTG CTC GAT GOC AAG ATC TGT ACC CTO ACG GOC TGO OCC AAC ACC CAG 

V GOQAGVLOEARVPI XSHDVCHGADFYGN 

TAC TAT GGC CAA CAO GCC GOG OTA CTC CAG CAG GCT CCA CTC CCC ATA ATC AGC AAT GAT GTC TOC AAT OCC GCT CAC TTC TAT OCA AAC 



OIKPKMFCAGYPEGCIDACO GD 
CAG ATC AAG CCC AAO ATC TTC TGT OCT GGC TAC CCC GAG GGT CGC ATT GAT OOC TGC CAG GCC CAC 



G C P F V C E 
GGT GOT CCC TTT GTC TGT GAG 



-"^ S ISRTPRWRLCO I VSWOTCCALAQK PGVY 
GAC AGC ATC TCT CCG ACG CCA CGT TGG CGG CTC TGT CGC ATT CTG ACT TGC GGC ACT GGC TGT GCC CTG OCC CAO AAG CCA GGC CTC TAC 

t^vsofrewi FOA IKTBSEASGHVTOL • 
AOC AAA GTC ACT GAC TTC OCG GAC TGC ATC TTC CAG GCC ATA AAG ACT CAC TOC CAA GCC ACC GGC ATG CTG ACC CAO CTC TGA CCGOTOG 



1507 CTTXrrCGCrOCGCAGCCTOC A GGGCCCCAOGTOATCCCCCTCGTGGGATCCACGCTGCOCOGACGATGGGACCTTTTTCTTCTI 



1626 CACGGT< 



rOGCCCGCCCACTCAGCCCCCAGACCACCCAACCTCAOOCTOCTCACCCCCATCTAAATATTGI 



gTCCACAGCTCCAAGOACACO C TCOCTC 



rCGGACTCCTGTCTAGOTGCCCCTGA 



174 3 TGATGGGATG C TC 1 1 TAAATAAT AAAGATGGTTTTGATT-poIy C A ) 

FIGURE 2: Nucleotide sequence of the cDNA coding for human hcpsin. The sequence was determined by analysis of the cDNA inserts shown 
in Figure 1. The predicted amino acid sequence b shown above the DNA sequence. The solid, inverted triangle marks the locaUon of the 
msertcd sequence found in clones HcpG2UW]7 and HepG2UW2 (sec Figure 1). This sequence is not included in Hgurc 2. The boxed amino 
acid sequence represents a potential transmembrane domain. The solid arrow identifies an Arg-Ile bond that is probably cleaved when the 
inactive zymogen is converted to an active protease. The active-site His. Asp. and Scr residues are circled.' The underlined nucleotide sequence 
IS the site responsible for hybridizing to the synthetic oligodeoxynucleotide probe. 



screened by using the entire cDNA insert from HUW1250 
as a hybridization probe. Approximately 70 positive clones 
were identified in the initial screening, and most of these were 
plaque purified. Phage DNA was then prepared from 19 of 
these clones. 

Digestion of the recombinant phage DNAs with EcoRI 
released inserts that ranged in size from approximately 800 
to 1800 base pairs (bp). Two of these inserts (HepG2TJW7 
and HepG2UW20) were selected for further analysis. A 160 
bp £coRl-Mol fragment derived from the extreme 5' end of 
HcpG2UW7 was then employed as a hybridization piobe, and 
the original 70 positives were rescreened. Subsequently, five 
additional clones, designated HepG2UW2, HepG2UWl7, 
HepG2UW19, HepG2UW6l, and HepG2UW63. were also 
selected for DNA sequence analysis. A restriction enzyme 
map for the seven cDNA inserts obtained from the Hep G2 
library is shown in Figure 1. The strategy used to determine 
the cDNA sequence of hepsin from the various clones is also 
described in Figure 1. 

The complete nucleotide'sequ'enoe of the cDNA coding for * 
:hepsin is shown in Figure 2, along with the deduced amino 



acid sequence. The total length of the cDNA was 1783 bp. 
This is consistent with the size of the mRNA for hepsin present 
in Hep G2 cells as determined by Northern blot analysis (data 
not shown). The cDNA includes 245 nucleotides of un- 
translated sequence at the 5' end, 1251 nucleotides coding for 
a protein of 417 amino acids, a stop codon of TGA, and 284 
nucleotides of untranslated sequence at the 3' end. The ATG 
codon at positions 246-248 was assigned as that coding for 
the initiator Met since it is the most 5'-proximal codon 
specifying a Met after the stop codon of TGA at positions 
138-140. The Tirst ATG rule* reportedly holds for the vast 
majority of eucaryotic mRNAs (Kozak, 1984). The nucleotide 
sequence surrounding the tentative initiator Met codon is 
GA CATG G. This differs somewhat from the optimal se- 
quence of ACC ATG G for translation initiation sites proposed 
by Kozak (1986). A purine is present, however, in a critical 
position located three nucleotides upstream of the ATG oodon. 
The length of 5' untranslated regions in eucaryotic mRNAs 
can vary, with the majority (^^70%) being in the range of 
20-^0 nucleotides (Kozak, 1984). The 245 nucleotides up- 
stream from the apparent initiator Met represent a rather long 



107G BIOCHEMISTRY 



LEYTUS ET 



AL. 





• 2.0 




1.5 






c 


1.0 






o 


0.5 


B 


0.0 






-0.5 




- I.O 



Hydrophobic 




ISO ZOO 250 500 3SO 400 
Sequence Number 

FIGURE 3: Hydropathy analysis of the deduced amino acid sequence 
of hepsin. The method of Kyte and DoolitUe (1982) was employed. 
using a window of 20 residues. The peak spanning residues 1 8-44 
represents the putative transmembrane domain. 

5' untranslated region for hepsin. Although the precise role 
of the 5' untranslated sequence in mRNAs has not been 
established, it has been suggested that secondary structurc(s) 
in long S' untranslated regions may be Involved in the regu- 
lation of transcription or . translation (Kozak. 1984). 

In contrast to nsost other serine proteases, the cDNA se- 
quence coding for hepsin did not predict the presence of a 
typical signal peptide. However, hydropathy analysis (Kyle 
& Doolittle, 1982) revealed the presence of a single, very 
hydrophobic domain of 27 residues near the amino terminus 
of the molecule (residues 18-44, Figure 3). This hydrophobic 
domain, starting 18 residues downstream from the apparent 
initiator Met, contains no charged amino acids and is suffi- 
ciently long and nonpolar to span a lipid bilayer. Furthermore, 
this potential membrane-spanning domain is flanked on either 
side by charged amino acids, which may serve to help anchor 
the protein in a membrane. 

From restriction enzyme mapping and DNA sequencing, 
it was found that clones HcpG2UW17 and HepG2UW2 had 
additional sequences near their 5' ends that were not present 
in the other cDNA inserts. Beginning at position 192 in the 
nucleotide sequence, clone HepG2UW17 contained an addi- 
tional 580 bp of DNA. This sequence was as follows: 
GTAAGGACAAGGGCCCCCAGACTCACAGTTCCA- 
GCCCTGAGGACAGGGGTTCCCTCATCCCCCCAC- 
CCAGCCTAATGCCCACCTCCTAATAGAGGGGTT- 
CCTGGGGACCTGAAGAGGGGGCACTATGACGT- 
CTCCCCAAGCACCTAGGTC3TTCTGTCCTGCTCT- 
TCCTTCAGACTCAGCCGTTGGACCCCAGTCCTTT- 
CCTCCCCAGACCCAGGAGTTCCAGCCCTCAGGC- 
CCCTCCTCCCTCATACTAGGGAGTCCTGGCCCO 
CAAATTCCTCCTTTCCCAAGACTTATGATTTCA- 
GGTCCTCAGCTGTCTCCTCCCTCAAACCGGGAT- 
CCrrCAGTCCCCTGCTCCACCAGGCTCAGGCATG- 
GGGGTCCCCATCCCTGCAAATCCAGGCGTCCCC- 
CCGCTGCTGGTCAGACACTGACCCCATCCTTGA- 
ACCCAGCCCAATCTGCGTCCGTGATCACGGCGT- 
GCTCTGGCCAAGGCCCAGTCCCTACAGCCTGCC- 
TGGATGGACGCCTGGGACTGGGGGCGCCAGGA- 
CTGGGCTGGGCTGGGCTCCCCCAGGCCCTGCCT- 
CCCCGTCCATCTC CTCACAG . Analysis of this sequence 
suggests that this insertion probably represents an unspliced 
intron or a remnant of an intron. The underlined hexa nu- 
cleotide sequences at the beginning and end of this sequence, 
GTAAGG and TCACAG, respectively, conform to consensus 
hescanucleotide sequences found at the 5' and 3' ends of introns 
adjacent to intron/exon splice junctions (Breathnach & 
Chambon, 1981; Ncvins. 1983), The GTAAGG donor site 
and the TCACAG acceptor site are probably used "for splic- 



ing-out this intronic sequence in the majority of thc inRNA ' 
molecules coding for hepsin. In the case of clon 
HepG2UW17, this sequence was not spliced-out when ih^ 
mRNA molecule that gave rise to this particular insert wa^ 
being processed. The additional sequence near the 5' end of 
clone HepG2U W2 is also probably due to improper splicing 
of the same intron. In this case, the cellular splidng apparuiii- 



apparently used the proper donor site (GTAAGG, underUned 
above), but an alternative acceptor site (ACCCAG, underlined' 
above). This removed most of the intronic sequence but left 
behind 145 nucleotides. With the exception of these two ^ 
probable splicing errors, no other differences were detected 
among the cDNA inserts in regions where overlapping sc-- " 
quences were obtained. 

At the 3' end of the cDNA. the sequence of AATAAA was 
present 14 nucleotides upstream from the polyadenylation site 
This sequence, which generally occurs 10-30 nucleotides up^ 
stream from the poly (A) tail, apparently functions as a signal 
for polyadenylation by cither specifying the proper cleavage 
site of mRNA transcripts or serving as a recognition sequence 
for poly(A) polymerase (Proudfoot & Brownlee, 1976: Nevins 
1983). 

The base composition of the cDNA coding for hepsin was 
particularly rich in G and C. The total nucleotide composition 
was calculated to be 17.0% A. 19.1% T, 31.2% G, and 32.5% 
C. The 245 bp 5' untranslated region contained an even higher 
content of C, and its base composition was calculated to be 
17.1% A, 12.6% T, 28.5% G, and 41.6% C. 

Besides, the open reading frame that codes for hepsin, an 
unusually long open reading frame was observed in the inverted 
sequence of this cDNA. This open reading frame spanned 
1353 nucleotides (nucleotides 105-1457 in the inverted se- 
quence).. The amino acid sequence deduced from this open 
reading frame was used in a search of the protein sequence 
database (National Biomedical Research Foundation, Wash- • 
ington, DC), but little signincant sequence identity was found 
with any other known protein. Furthermore, there were ho 
Met residues in the deduced amino acid sequence that could 
serve as a start site for translation. 

Discussion 

Analysis of the cDNA sequence presented for hepsin in- 
dicates that it codes for a protein that is a member of the serine 
protease family. The cDNA coding for hepsin was isolated 
from cDNA libraries prepared from human liver and Hep G2 
cell line mRNA. Preliminary data by Northern analysis in- 
dicate that the mRNA coding for hepsin is also expressed in 
a human osteosarcoma cell line. It is either not expressed or 
expressed only at very low levels in hum^n endothelial cells, 
smooth muscle cells, and skin fibroblasts, as determined by 
Northern analysis. 

The amino acid sequence of hepsin, deduced from the nu- 
cleotide sequence of its cDNA, is very similar to other serine 
proteases, especially in those regions that arc highly conserved 
among this group of enzymes. It contains His, Asp, and Set.. _i 
residues at positions 203, 257, and 353, respectively. These , 
amino acids are analogous to the His57, Aspioi* and Serjgi y 
residues in chymotrypsin that constitute the catalytic triad v: 
essential for enzymatic activity (Blow et al., 1969). . The . 
presence of an Asp (as opposed to a Ser) at position 347 V;^ 
suggests that hepsin possesses a substrate speciHcity similar;- j! 
to that of trypsin (Steitz et al., 1969; Hartley, 1970). Thwx^j 
residue is thought to contribute to substrate binding in thj?l;f 
active site of serine proteases and, for trypsin-like scrioe'^]' 
proteases, results in a preference for basic amino acids-t^Uhii 
The cDNA sequence predicts an Arg-Ile-Val-tely-Gly"aa ' 



tivatio 
hepsin 
to an 1 
peptid 
consist 
1-162; 
and a 
carbox 
serine 
the no; 
expect' 
togeth< 
(Natioi 
sbowec 
serine 
TTiese i 
among 
boxyl-t 
shares 
in four 
4). Cc 
degree 
proteas 
Whe 
differei 
emerge 
occurri. 
(Hartle 
are mu< 
analysis 
et al. (1 
six vari: 
conscrv. 
letions s 
of the pi 
and act 
whereas 
their ur 
com pari 
hepsin ^ 
apparent 
and vari 
The b 
just prio 
sequence 
X (Leyti 
Factor ? 
cursors c 
and rele; 
the activ 
activatio 
serine pr 
it seems j 



# 



cdna'coding for human hepsin 



VOL. 37, NO. 3, I98S 1071 



Repsln 
F*otor X 
Protelo C 
Fcotor VII 
Fsotor IX 



CS9-133> 
(98-142) 
(91-136) 
(88-133) 



C V D B - 



S L D N 
5 L O N 



C V| 



cJn Z X n 



L P 



0 



DCDQFCBEBQ 
C c t|^y CLEBVCWSR- 
OCEQYCSDnTGTKRS 
EQPCENSAD 



N g[v] 



c 


D 


c 


p 


R 


G 


B 


P 


c 


s 


c 


A 


R 


0 


T 


T 


c 


s 


c 


A 


P 


0 


y 


K 


c 


B 


c 


D B 


0 


If 


S 




S 


c 


T 


B 


G 


T 


R 




G D O L L Q 
LlAlD G V 8 




0 



L AlE N Q K S 



Q D 

I P T C P T P 

B P A V c' F P 

T P T V E T P 

E P A V P F P 



C G 

C G 

C C 

C O 

C O 



ncuRB 4: Comparison of the carboxyl-terminal end of the noncatalytic chain of hepsin with oorresponding regions in the noncaUlyttc chains 
of factor X (McMuUen et al., 1983), protein. C (Foster & Davie. 1984), factor VII (Hagcn et al., 1986), and factor IX (Kurachi & Davie, 
1982). Gaps have been inserted to bring the protein sequences into better alignment. The numbers in parentheses refer to the location of 
the sequence in that particular protein. Amino acids are boxed if they are found at the same location in hepsin and one or more of the other 
proteins. 



tivation site sequence (residues 162-166). This suggests that 
hepsin b synthesized as an inactive zymogen which is converted 
to an active serine protease by cleavage of the Argj^-Uci^ 
peptide bond. The resulting active serine protease would 
consist of two chains, including a noncatalytic chain (residues 
. 1—162) derived from the amino-tcrminal end of the zymogen 
and a catalytic chain (residues 163-417) derived from the 
carboxyl-terminal end. By analogy with the various plasma 
serine proteases, the Cys residues at positions 153 and 277 in 
the noncatalytic and catalytic chains, respectively, could be 
expected to form a disulfide bond that holds the two chains 
together. A computer search of the protein sequence database 
(National Biomedical Research Foundation, Washington, DC) 
showed that a portion of hepsin differs subsUntially from all 
serine proteases for which there is sequence data available. 
These data also showed that the noncatalytic chain is unique 
among known protein sequences except for its extreme car- 
boxyl-terminal region. This portion of the noncatalytic chain 
shares some sequence similarity with corresponding regions 
in four of the vitamin K dependent serine proteases (Figure 
4). Conversely, the cau lytic chain of hepsin exhibits a high 
degree of similarity with the catalytic chains of other serine 
proteases (Figure 5). 

When the primary structures of the catalytic chains of 
different serine proteases are compared, the pattern that 
emerges is one of small stretches of highly similar sequence 
occurring at various intervals along the polypeptide chain 
(Hartley & Shotton. 1971). Furthermore, internal residues 
are much more highly conserved than external ones* In their 
analysis of the catalytic chains of several serine proteases, Furie 
et al. (1982) identified seven conserved regions separated by 
six variable regions. The variable regions, which show little 
conservation of sequence, in addition to containing short de- 
letions and insertions, are thought to be located on the surface 
of the protein. This helps to explain why the internal structures 
and active sites of different serine proteases appear similar, 
whereas their surfaces, which play a major role in determining 
their unique substrate specificities, vary considerably. By 
comparing the amino acid sequence of the catalytic chain of 
hepsin with those of other serine proteases (Figure 5), it is 
apparent that h^in also follows the same pattern of conserved 
and variable regions. 

The highly basic sequence Arg-Arg-Lys (residues 1 55-1 57) 
just prior to the apparent activation site is similar to the basic 
sequences that also precede the activation sites in human factor 
X (Leytus et al.. 1984) and protein C (Foster & Davie, 1984). 
Factor X and protein C are synthesized as single-chain pro- 
cursors and arc converted to two-chain zymogens by cleavage 
and release of these basic residues. Subsequent cleavages at 
the activation sites for factor X and protein C release short 
activation peptides and result in the generation of an active 
serine protease. If the analogy is extended to include hepsin, 
it seems possible that this protein may also exist as a two^hain 



zymogen that releases a short peptide (e.g.. Leu-Pro-Val- 
Asp-Arg) upon its conversion to an active enzyme. 

Compared with other serine proteases, the number and 
positions of 9 out of the 10 cysteine residues in the catalytic 
chain of hepsin are highly conserved. On the basis of the 
known disulfide bridge arrangement in chymotrypsin (Kcil el 
aU 1963; Brown & HarUey, 1966), trypsin (Kauffman. 1965), 
prothrombin (Magnusson et al., 1975). plasmin (Sottrup- 
Jensen et al., 1978; Wiman, 1977), and factor X (Hojrup & 
Magnusson, 1987), and by analogy with other serine proteases, 
four intrachain disulfide bonds at cysteine pairs 188/204. 
291/359, 322/338, and 349/381 would be expected. In ad- 
dition. Cys277 is probably involved in a disulfide linkage with 
the noncatalytic chain. The remaining CysjTj has no analogous 
counterpart in other serine proteases. One possibility is that 
this extra Cys may participate in an interchain disulfide bridge 
between two monomers of hepsin. analogous to that proposed 
for factor XI (Fujikawa et al.. 1986). In the noncatalytic chain 
of hepsin, the cDNA sequence predicts' the presence of nine. .; 
Cys residues. Cys, 33 is probably involved In the disulfide 
linkage with the catalytic chain; This leaves an even number 
of Cys residues in the noncatalytic chain that could form 
intrachain disulfide bonds. 

From crystal lographic and kinetic studies of chymotrypsin 
and trypsin and from knowledge of their primary structures, 
it has been possible to identify residues in these enzymes that 
are involved in substrate binding and catalysis [reviewed in 
Birktoft et al. (1970), Hartley and Shotton (1971), and Kraut 
(1 977)]. Since some of these residues are essential for proper 
function, it was of interest to make a more detailed comparison 
with hepsin (Figure 5) and to determine whether hepsin 
possessed these same essential residues. 

(a) During the conversion of chymotrypsinogen to chymo- 
trypsin, the peptide backbone of segment 187-193 becomes 
more extended, resulting in the creation of a substrate binding 
pocket (Kraut, 1971). The peptide backbone of residues 
Seri89-Ser,9o-CyS|9i-Met,93 forms one side of this substrate 
binding pocket in chymotrypsin (Steitz et al.. 1969). This 
sequence is Aspi89-Ser,90'Cys,9,-Gln|g2 in trypsin and 
Aspis9-Ala|9o-CyS|9|-Glni92 in hepsin. 

(b) The opposite side of the substrate binding pocket in 
chymotrypsin is lined by residues Serju-Trpais-Glyjie. The 
peptide backbone of these residues is thought to interact with 
the side chains of the substrate for properly orienting the bond 
that is to be cleaved (Steitz et al., 1969). This stretch of amino 
acids is also present in hepsin. 

(c) Hydrogen bonding between Cysj9i/Aspi94 and 
^Pi94/Giyi97 provides a rigid structure in the peptide back- 
bone of chymotrypsin in the vicinity of the active site. This 
helps to hold the active-site Ser,95 io the proper orientation 
and is maintained only if Gly residues are present at positions 
193 and 196 (Birktoft et al.,"1970). -Hepsin also has Gly 
residues at these two positions. 




.01-: 




- t'-t' . 





• 



1072 'BIOCHEMISTRY 



LEYTUS ET 



AL; 



BEFSin 

CSnOTRrFSIH 
TKYPSIH 
FACTOR X 
ntDTHRCKBIH 
U'A 

FL&SKIH 
rACTOK XII 
PLASMA KAU.IX8ZIH 

GCHFLZKEirr cut 

CATBCFSin C 
TQZriN 
ADJFSIR 
B r ACTOR 



20 
I 



30 
I 



ZVC3SRDTSLGRHIWQFVSI 



ryWgEEAVPEa WIWVSL qoxteit F TOCSL IHE1IW?VTAAHC 



XVOOQECKDCECPWQALX t 



VVS^VA1.SSAB£YZAA£ f 

ivu> inafrwcy pffWQivsi 

JIgPOXAXHOT FPWJVF t - 
ZISeB^SIUPBSRfYMAY^ 

^L^QSAAAOARfYHASV - 
lIIiS0TVVFQSR£YHAI4. 



. HEP8IH 
CBYHOIKYPSIH 
TRTTSIH 
PACTOJt X 
FROTBRCraiN 
tPA 

PLASMIH 
FACTOa XJI 
FUSHA KALLIXREIN 
OGMFLEKEirT CIA 
CATBZPSIH C 

roniN 

ADIPSIN 
B FACTOR 



CRl 



100 

TIK£ 2IT2,LX2(a' 



40 

I 



50 



&0 



70 



60 



00 



IXSCDWVLTAAmi?' FESBaVLSRKRVFACAVAQA5FBCLaLCV0AW1rBCXm.^FIm 



•ayoCAaCT CSl X B CPWVLTAAHJ- - • • FEBSaVLERKRVFACAVAQAS rBCUQU>^MWYBOCn.PFKDFB - 8E 

Jtl laUVyV^ — '-ZFDqgSSSZXIQKUUACVFXRSKyS-SL 

-PSgrrf^peapgLmscwv sAAHdf xsGigsBLg— QOHiHwzs^goFiSASKSivBFsrii-sfi 

rtCEHBC FPPST I I^EFT J J.TAAHC L— TQAK^IV^VS OHHTEQgEPCZAVU£VlLVVIlCHMBrT->q; 

lVEg5QA£Z9G£5QKL^ BXSPQZ-1 UCOASLISP RWgLTAABC XTP mUIfiy l EH PLLgUg KBSRSRYEIUIIEUSMLDCIYiaFRTirHRg 

£ISSI-7ADIASB£M;2AAZ rAXHRaSPCaiF LOCPI HSSOffL SAABC ^— ' QE&FFPBHLTSXI^ BIYKWPCEECIIXFEVEXXIVBKEFIHDD 

VVUtCVAB WSWEWOVS ij »BTRF9t F PCG TL I SP EWyLTAARC L SESFSF3STXVXL5»*-ABQEV]O.EFSVQBXEVBRl.FIXPr 

-HgBS FCASSI.IAP CWV1.TAABC L QDaPAFEDLTZVLS -Q£RRItaSaEPCQTLAVRSXRLaEAFS-PV 

—- OVKLIAORI tXggSLIGB CyyLTAAHC £ OGLFUSIVl&IYSg XLHLSSIIXDTPFSQIKEIZIBgfmC-VS 

HZBC JtOaGM^JCCaa UJMB t .-rFKrHTlACyiHASlJaM— HXiryEEIJgfeSHHPIRRVS^ 

— QlQSPAgOS ^SSSFIiVItEBFSJif^ 4 GSHXHVTLG ^ZqRSZBTQga ITABRAXRBPQYii-aR 

HEl ^£SVltI0FSW7ZXAAI£ J BIOnfqVLLfi RHHLFKDE7FAQait|,VR(J5FR-HP 

QVRSTI gCOgn.tDEg'mJ AAHC 'I DGVTDODSVQVULS ABSLSAFEPYXR>nDVt|SW£aFCSH-FD 

KLSSPTbcAOALIgXt WgLTAAKC II VGKQSKFZLS ABSIinC-rPEgQIl.TVXXAF£rFCYD«ET 

CR2 ^ 



110 



120 



TYOF ^i^VLRjcK TPITFBMHVAP ACLF M^^^ESTi/fTqjTCl V SCTgt^H 



'SFLPLTEY ZQPVCI.1 
'T AAS PSQTVS AVCtP 



pI-- 



SAASi,KSRVASIS2^ TSCASAgTq£|.I jCWBWTK ^SSGTSTPOV ^KCLlCA£lL§ifS5£K 



K£VAF8DlIH£Vg[t£ >RETAASI4PAfinOG&y 



TYDli 2IAyjQ!,tC8DSS-aCAOESSVVBI2£LE -"PADLClLPHWnjCEL >gT£KH£ 

"iW ^I^LUCZ^ SEAVIXOKVIfAEtE ---SPjrrWASRTfiEFl rCWGETP - 

SYOE 31^I,I.HfflEDAIX;g:ALitSPr>aPV£|J - "SGAARPSETTLgQV ^GHGHPF 

fiCKI 3IAL IKIX> AFUTYTEFOKPICLP - "SKgOTSTiyTUSHV rCWGT SK 

HFEC 2XAItLEZtC HSVT]LGPKLl.PrCLF - - -DKPTFTDLGIMSYV ^SFfiVKE 

TIQJ 2irCtLQIia RBVBRHRWVHQAt£ - - -RAQEgUtPgTLCTV ICWCRVS 

OTH UI»CgLBLS E£ADIXBSVKVZ02^ TKEFK^STELA SCMgSTM 

SLEC }LX2,FK2;S OHASLGFHVRPIsP -LQYEDKEVEPgTLCDV ICWSWT 

TR£C^Lqi,VItj,K lOCATVHIUnfAILBlsE — -XJCGDOVKPgTR^ ^CHGRFG 



CR3 



130 
I 



140 
I 



AAOOALVDGKICTV nMStOQ YYGQQAffV ^QCASVPI ISNDVCil OADPTSNQIKP 



- -S^SDOFAA^TTC^ rCWSL TR 



iLK rrWTANVGKOQFSV ^]yVNl£SVEIlFVSK 



210 



CRA 



220 



ISO 
I 



EKgRgSTB Lt]CMLEV£yVDRM8£K- 



160 
I 



170 
I 



-KYMSTK2£D 
-SAIFC511TS 
•LSSSFl^TQ 
•CSTRII^TO 



ALSPFYSQI LtKEAByRLYPSSRC— T&QBLXiniTVTD 

-CTF^iL mEAQLPVTEKKVCH— RYEFUTORVOS 

EXSAEET^ ^2EAflZEn^LER£ SAPEyBSSSlLf 

EXgEIQHI iSKVHZ&VTJTEEQO— KR-YQD-YKITQ 
EKIABO— ^tPVBLfVAHPQACENWLRCXHRKDVFSQ 

■MRRCTOT ^RgVQLRVgRDRQS LRIF — gSTDPR 

PSEKWSBC [,gcVNXBLt.SrrEK£ IETYia)--DVTD 

-HA^lRPiyV .B0LgVSlHHRTT£13--LRTYHD&-VVTi 
NKSAPSEt L,R£VHITVXDRXI££-D£KBYHraPVXGL 

< CR5 



230 



240 



BEPSIH 

CBYMOTRYFSIH 

TmrFsm 

FACTOR X 

FROTHRCHBZH 

tPA 

PLASHIH 

FACTOR XII 

PLASMA KALLIKREIR 

OCHPLEKEHT CIR 

CATBZFSIN G 

TONIH 

AniFSIN 

B FACTOR 



IBO ISO 200 

KKFCAGie PE GGI DACQGoSsGFFV :E0SI&RTPRMI^ 'IVEWG-Tt^OUJUJICPCVyTXVSD-FREWIF^AIKTUSEASG 

A£{I£^- -A SgV SSC MgPSGGFL V nOCH GAtflJ^V ;iV5t*S SSTCST8T- ppyYA RVTA-LVliyvO JTLAAM 

HlffCAfiX ^ J5E« oscQGPSogpgy :scx yQ uSS^'S^^f^'OfSSS^XSES^'Tv^^ 7TiAS» 

BtSS£Sl " XOE aACQGPSGGPHV FRFK DIYFVT £ZygWg-EOCAmSYSZXIEYTA-£LI^D XS»qSIRG].FCAXSBAFEVJTSSPLX 

"tffPAffH (CP-—DBGKRC ZASESQSSSEEwsp'-mMgvfya^ ?i vswc -eccdrdgk »u r if i HvrR- uactjc evxoqfcs 

HHL£^ ntSGGFQAKUi 3ACOGPSGGP LV ILHD t^fiTy/ 3IISWG-U>CO<Jjaj V lWYTXV TH-YLDHIH »fHRP 

TELg&gB LA -ggl JSCQGDSGGPLV =FEK DKYIl^ TVTgWG-LqCARPWXPCVyVRVSR-rVTW^E mCUIN 

(%SLC&GF LB gCl JACOGDSGGP L\ ^EQg-AAERSLTJ^ SIISWG-SGCbi lR W RWVYT OVAY-YUiW;!^ ^TVS 

^ -eSK MCKGDSGGPLV ^CHH C»tffil,V 3ir£i£-ESARREQ£gVXXSHAE-YMDMIL !XT0S5DGKAGt]0SFA 

WttEEA^ £S UCq 3ACOGPSOG VFA VI^P— HXOSWVAT ;iVSWS -IGCSRS"YG rYTKVL H'YVDWTK CEMEEED 

RQICVGO W ER* AAFXSQgS££Ll :HH VAE 3IV3 TG- - •KSS UVIfVbVV T R -fJ.Sym rTKRSFELLDQSJEZPI: 

dJLC^CI -g ggK DT£A^S^£LI :DG- -IQJC! SXTSGGAlPC&KFrT£AIjrA£LIK-£TSiaK CVH5ENF 

>W2&ES TO-— J PTC RPDSC SPLV 2Ca AVI SVVTWSSRVga»Ig gXPgyrTl IV3S-Y»*rTg 4ITHGRMTS 

HUi£^ Ji ggn 2scHSE555£Li 3x; ly ±XTgr5snscoRHHESjQcxFx^XBiJiw2^ ci>q;GSV 



<3fiX> 
(43X> 
(34X> 

<3ai> 

(30X) 
C4ZX) 
(38Z> 
(«0Z) 
(32X> 
(320 
(34Z) 
(311) 
(31Z) 




of seven conserved regions (CRl-7) arc csscnUally the same as those designated by Furieet al. (1982). Since variable regions show minima) 
sequence oonservaUon, UtUe attempt was made to optimize the homology in these regions. Otherwise, gaps have been inserted to bring the 



swjuences mto belter alignment. Asterisks have been placed above the active-site residues His„, Aspi„. and Scr,„ that compose the catelytic 
tnad. An arrow mdicates the loMUon of the extra Cys residue in the sequence of hepsin. Residues are underlined when the wme amino acid 
IS found at the same posiUon in hcpsm. The percentage listed in parentheses at the end of each sequence represents the extent of similarity 
between hcpsm and that protem, as calculated from this alignment. 



(d) All acidic (Asp and Glu) and basic (Arg, Lys, and His) 
side chains are placed on the surface of cbymotrypsin, with 
the exception of Asptoi and Asp,94, which are buried in the 
interior of the molecule. In trypsin, there is an additional 
buried acidic side chain at Aspig,. Hepsin contains the two 



buried Asp residues ooounon.to both chymotrypdn and trypsin, - - 
nanaely, Aspioj and Asp, 94. In addition, at the position which ' 
has the greatest influence on substrate specificity (position 
189), hepsin .contains an Asp. residue. Thus, it is prcdirtccljv^!^ 
that hepsin would have a preference for substrates with basic *>'' - 



CDN/ 

.side cl 
(1970 
189 m 
acidic 

(c) 
chains 
cbymo 
or cha: 
pocket 
In hep 
226 is 

(01 
a flexil 

in chyi 

aL. 19 

nonpol 

trypsin 

hepsin, 

in serir 
tivatioi 
also pr 

The 
a potcr 
to sever 
recepto 
et al., 1 
(van Di 
protein: 
that is 
hydropj 
though I 
drophot 
membri 
termini; 
facing i; 
& Dricl 
1985; S 
brane-s] 
membra 
the mec 
of the ei 
the amii 
tcrnunui 
sequenci 
in hepsi 
membra 
of trans) 
would p 
hepsin v 
processe 
involve I 
well chai 
serine pj 
these pn 

Itisdi 
of hepsii 
it probab 
fibrinoly.* 
cxpressei 
tbesize a 
involved i 
synthesi7 
propeptjc 
depender 





CDNA COOING FOR HUMAN HEPSIN 

side chains. It is of interest to note that Shotton and Watson 
(1970) made Xht prediction that a basic residue at position 
189 might result in a serine protease with a preference for 
acidic side sbains. 

♦ 

(e) In the three-dimensional model, for elastase, the side 
chains of Valjie Tfar226> replacing Glyji^ and Gly226 in 
chymotrypsin and trypsin, block the entrance of hydrophobic 
or charged substrates with bulky side chains from the binding 
pocket (Shotton & Hartley, 1970; Shotton & Watson, 1970). 
In hcpsin, the presence of Gly residues at positions 216 and 
226 is preserved. 

(0 The side chain of residue 192 has been described as being 
a flexible cover to the entrance of the substrate binding pocket 
in chymotrypsin (Steitz et al., 1969) and trypsin (Krieger et 
ah, 1974). In chymotrypsin, Met]92 may help provide a 
non polar environment for substrate side chains, whereas in 
trypsin Glnj^s may provide a more polar environment. In 
hepsin, position 192 is Gin. 

(e) The sequence Glyi4o-Trp,4i-Gly,42 is highly conserved 
in serine proteases and is presumed to be involved jn the ac- 
tivation process (Fehlhammer ct al., 1977). This sequence is 
also present in hepsin. 

The absence of a typical signal peptide and the presence of 
a potential transmembrane domain in hepsin are analogous 
to several other proteins recently described. Asialoglycoprotein 
receptor (Holland et al., 1984), transferrin receptor (Schneider 
el al., 1984), and plasma cell membrane glycoprotein PC-1 
(van Driel & Coding, 1987) arc examples of transmembrane 
proteins which lack a typical amino-terminal signal peptide 
that is cleaved during biosynthesis. These proteins possess 
hydrophobic domains near their amino termini which are 
thought to function as internal signal sequences. The hy- 
drophobic domains direct insertion of these proteins into the 
membrane of the endoplasmic reticulum, leaving the amino 
terminus facing the cytoplasm and the carboxyl terminus 
facing into the lumen of the endoplasmic reticulum (Holland 
& Drickamer, 1986; Zerial ct al., 1986; Wickner & Lodish, 
1985; Spiess & Lodish, 1986). If a protein with a mem- 
brane-spanning domain is ultimately destined for the plasma 
membrane, its orientation at the cell surface is determined by 
the mechanism by which it was inserted into the membrane 
of the endoplasmic reticulum. For the cases mentioned above, 
the amino terminus faces the cytoplasm, whereas the carboxyl ' 
terminus is extracellular. The lack of an amino-termtnal signal 
sequence and the presence of an internal hydrophobic domain 
in hepsin suggest that it is synthesized and integrated into 
membranes in a manner similar to the above-mentioned group 
of transmembrane proteins. If this were the case, then one 
would predict that the carboxyl-terminal caulytic chain of 
hepsin would be on the outside of the cell. There are many 
processes occurring extraccUularly near the cell surface that 
involve limited proteolysis. Although these have not yet been 
well characterized, an activatable, trypsin-Uke, transmembrane 
serine protease may' be an important participant in some of 
these processes. 

It is difficult to speculate as to the true physiological function 
of hepsin. Since it may be a membrane-associated protein, 
it probably is not participating in such processes as coagulation, 
fibrinolysis, complement activation, etc., unless it is also being, 
expressed by endothelial or blood cells. Since liver cells syn- 
thesize and secrete many different proteins, hepsin might be 
involved in the modification of other proteins as they axe being 
synthesized or secreted. This could include the removal of 
propeptides, from hormones, growth factors, or, the vitamin. K* 
dex^endeot proteases or the activation or inactivation of other 



VOL. 27, NO. 3, 1988 1073 

■ proteins. It is unclear, however, how hepsin is converted from 
a zymogen to an active enzyme and whether this involves 
another serine protease or whether hepsin is capable of au- 
toactivation, Answeis to these questions will require additional 
experimentation. 

Acknowledgments 

We thank Drs. Akitada Ichinose, Jose Lopez, Kazuo Fu- 
jikawa, and Dominic Chung for valuable discussions and ad- 
vice. We also thank Lois Swcnson for her assistance in the 
preparation of the manuscript. 

References 

Arlaud, G. J., & Gagnon, J. (1983) Biochemistry 22, 
1758-1764. 

Benton, W. D., & Davis, R. W, (1977) Science {Washington^ 
D.C) 196, 180-182. 

Biggin, M..D., Gibson. T. J., & Hong, G. F. (1983) Proc, Nati. 
Acad. Sci. U^u4. 80, 3963-3965. 

BiUings, P. C, Carew, J. A., Keller-McGandy, C. E., Gold- 
berg, A. L., & Kennedy, A. R. (1987) Proc. Natl. Acad. 
Sci, US. A. 84, 4801-^805. 

Birktoft, J. J., Blow, D. M.. Henderson, R., & Steitz, T. A. 
(1970) Philos. Trans. R, Soc. London, B 257, 67-76. 

Birnboim, H. C, & Doly, J. (1979) Nucleic Acids Res. 7, 
1513-1523. 

Blow, D. M., Birktoft, J. J.. & Hartley, B. S. (1969) Nature 

{London) 221, 337-340. 
Breathnach, R., & Chambon, P. (1981) Annu. Rev. Biochem. 

50, 349-383. 

Brown, J. R., & Hartley, B. S. (1966) Biochem. J. 101, 
214-228. 

Chandra. T.. Stackhouse, R., Kidd, V. J., & Woo. S. L. C. 

(1983) Proc. Natl. Acad. Sci. U.S.A. 80, 1845-1848. 
Christman, J. K., Silverstein, S. C, & Acs, G. (1977) in 

Proteinases in Mammalian Cells and Tissues (Barrett, A. 

J., Ed.) pp 91-149, Elsevier, Amsterdam and New York. 
Chung, D. W., Fujikawa, K., McMullcn, B. A., & Davie, E. 

W. (1986) Biochemistry 25, 2410-2417. 
Collen, D. (1980) Thromb. Haemostasis 43, 77-89. 
Cook. K. S.. Groves, D. L.. Min, H. Y., & Spicgelman, B. M. 

(1985) Proc. Natl, Acad. Sci. US. A. 82, 6480-6484. 
Cook, K. S., Min, H. Y., Johnson, D., Chaplinsky, R. J., Flier, 

J. S., Hunt, C. R., & Spicgelman, B. M. (1987) Science 

{Washington, D.C.) 237, 402-405. 
Cromlish, J. A, Seidah, N. G., & Chretien, M. (1986) 7. Biol. 

Chem. 261, 10850-10858. 
Davie, E. W., Fujikawa, K., Kurachi, K.. & Kisiel, W. (1979) 

Adv. Enzymol. Relat. Areas hfol, Biol. 48, 277-318. 
Dayhoff, M. O. (1979) in Atlas of Protein Sequence and 
Structure (Dayhoff, M. O., Ed.) Vol. 5, Suppl. 3, pp 1-8, 
National Biomedical Research Foundation, Washington, 
DC. 

Dayhoff. M. O., Barker, W. C, & Hunt, L. T. (1983) 

Methods Enzymol. 91, 524-545. 
Degen, S. J. F., MacGiIli^h^ay, R. T. A.. & Davie, E. W. (1983) 

Biochemistry 22, 2087-2097. 
Eisenhauer. D. A., & McDonald, J. K. (1986) J. Biol. Chem. 

261, 8859-8865. 
Eyl, A., & Inagami, T. (1970) Biochem. Biophys. Res, Com- 

mun. 38, 149-155. 
Fehlhanuner, H., Bode, W., & Hubcr, R. (1 977) 7. MoL Biol. 

777, 415-438. 

Fdster. X>., & Davie. E. W. (1984) Proc. Natl. Acad. Sci. 
US .A. 81, 4766-4770. 




f.' 



• AUG;- V- 




1074 BIOCHEMISTRY 

Fujikawa, K., Sl McMuUcn, B. A. (1983) J, BioL Chem, 258, 
10924-10933. 

Fujikawa, K., Chung, D- W., Hendrickson, L. E., & Davie, 
E. W. (1986) Biochemistry 25, 2417-2424. 

Furie, B., Bing, D, H., Feldmann, R. Robison, D, J., 
Burnier, J. P.. & Furic, B. C. (1982) /. Bioi. Ckem. 257, 
3875-3882. 

Gergen, J. P., Stern, R. H., & Wcnsink, P. C. (1979) Nucleic 

Acids Res, 7, 21 15-2136. 
Gcrshcnfcld, H. K., & Wcissman, I. L. (1986) Science 

{Washington, D.C.) 232, 854-858. 
Hagen, F. S., Gray. C. L.. O'Hara, P.. Grant. F. J.. Saari, G. 

C, Woodbury, R. G., Hart, C. E.. Inslcy, M., Kisiel, W., 

Kuracbi, K., & Davie, E. W. (1986) Proc. Natl. Acad, Sci, 

U.S.A. 83, 2412-2416. 
Hartley, B. S. (1964) Nature {London) 201, 1284-1287. 
Hartley, B. S. (1970) Philos, Trans. R, Sac. London, B 257, 

77-87 . 

Hartley, B. S., & Kauffman, D, L. (1966) Biochem, J. J 02, 
229-231. 

Hartley, B. S., & Sholton, D. M. (1971) Enzymes {3rd Ed.) 
3, 323-373. 

Hojrup, P.. & Magnusson, S. (1987) Biochem, J. 245, 
887-892. 

Holland, E. C, & Drickamer, K. (1986) /. Biol. Chem. 261, 
1286-1292. 

Holland, E. C, Leung, J. O., & Drickamer, K. (1984) Proc. 

Natl. Acad, Sci. U,S.A. 81, 7338-7342. 
KaufTman. D. L. (1965) J, Afol. Biol. 12, 929-932. 
Keil, B., Prusik. Z., & Sorm, F. (1963) Biochim, Biophys, 

Acta 78, 559-561. 
Kozak, M. (1984) Nucleic Acids Res. 12, 857-872. 
Kozak, M. (1986) Cell {Cambridge, Mass.) 44, 283-292. 
Kraut, J. (1971) Enzymes {3rd Ed.) 3, 165-183. 
Kraut, J. (1977) Anna, Rev, Biochem. 46, 331-358. 
Krieger, M.. Kay, L. M., Sl Stroud, R. M. (1974) J, MoL Biol. 

83, 209-230. 

Kurachi, K., & Davie. E. W. (1982) Proc. NatL Acad. Sci. 

U.S.A. 79, 6461-6464. 
Kytc. J., & DooHttle, R. F. (1982) J. Mol, Biol. 157, 105-132. 
LaBonibardi, V. J., Shaw, E., DiStefano, J. F., Beck, G., 

Brown, F., & Zucker, S. (1983) Biochem, J, 21 J, 695-700. 
Lazure, C. Leduc, R., Seidah, N. G., Thibault, G., Genest, 

J,, & Chretien, M. (1984) Nature {London) 307, 555-558. 
Lcytus, S. P., Chung, D. W., Kisiel, W.. Kuradii, K., & Davie, 

E. W. (1 984) Proc, Natl, Acad, Sci, U,S,A. 81, 3699-3702. 
Leytus, S. P., Kurachi, K., Sakariassen, K. S., & Davie, E. 

W. (1986a) Biochemistry 25, 4855-4863. 
Lcytus, S. P., Foster, D. C, Kurachi, K., & Davie, E. W. 

(1986b) Biochemistry 25, 5098-5102. 
Lobe, C. G., Finlay, B. B„ Paranchych, W., Paetkau, V, H.. 

& Bleackley, R. C. (1986) Science {Washington, D.C) 732, 

858-861. 

Lundgren, S., Ronne, H., Rask, L., & Peterson, P. A. (1984) 
J. Biol. Chem, 259, 7780-7784. 

Magnusson, S., Petersen, T. E., Sottrup-Jenseii, L., & Claeys, 
H. (1975) in Proteases and Biological Control (Reich, E., 
Rifkin, D. B.. & Shaw, E., Eds.) pp 123-249, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, NY. 

Malinowski, D. P., Sadler; J. E.» & Davie, E. W. (1984) 
Biochemistry 23, 4243-4250. 

Maniatis, T., Fritsch, E. F., & Sambrook, J. (1982) in Mo- 
lecular Cloning: A Laboratory Manual, Cold Spring Harbor 
Laboratory, Cold Spring Harbor. NY. 



- LB YTUS- BT .AL.-* 

Masson, D., &.Tschopp, J. (1987) Cell {Cambridge, Mass.) 
^P, 679-685. 

Maxam, A. M., 8c Gilbert. W. (1980) Methods Enzymol. 65, 
499-560. 

McMuUen. B. A., Fujikawa, K., Kisiel, W., Sasagawa, T., 

Howald, W, N., Kwa, E. Y., & Weinstein, B. (1983) 

Biochemistry 22, 2875-2884. 
MelouD, B., Klub, I., Kostka, V.,- Moravek, L., Prusik, Z., 

Vanecek, J., Kcil, B., & Sorm, F. (1966) Biochim, Biophys. 

Acta 130, 543-546. 
Messing, J. (1983) Methods Enzymol. /O/, 20-78. 
Micard. D., Sobrier, M. L., Couderc, J. L., & Dastugue, B. 

(1985) Arial. Biochem. 148, 121-126. 

Mikes, O., Holeysovsky, V., Tomasek, V., & Sorm, F. (1966) 
Biochem. Biophys. Res. Commun. 24, 346—352. 

Neurath, H.. & Walsh, K. A. (1976) Proc, Natl. Acad. Sci, 
U,S,A, 73, 3825-3832. 

Nevins, J. R. (1983) Annu. Rev, Biochem, 52, 441-446. 

Pasternack, M. S., VerrcU C. R., Liu, M. A., & Eisen, H. N. 

(1986) Nature {London) 322, 740-743. 

Pennica, D., Holmes, W, E.. Kohr, W. J., Harkins. R. N., 
Vchar, G. A., Ward, C. A.. Bennett, W. F., Yelvcrton, E., 
. Secburg. P. H., Heynekcr. H. L-. & Gocddcl, D. V. (1983) 
Nature {London) 301, 214-221. 

Proudfoot, N. J., & Brownlee, G. G. (1976) Nature {London) 
255,211-214. 

Reid, K, B. M., & Porter, R. R. (1981) Annu. Rev. Biochem. 
50, 433-464. 

Salvesen, G., Farley, D., Shuman. J., Przybyla. A., Rcilly, C, 

& Travis, J. (1987) Biochemistry 26, 2289-2293. 
Sanger, F., Nicklen, S„ & Coulson. A. R. (1977) Proc. Natl. 

Acad, Sci, U.S.A. 74, 5463-5467. 
Schneider, C, Owen, M. J., Banville, D., & Williams, J. G. 

(1984) Nature {London) 311, 675-678. 
Shotton, D. M., & Hartley, B. S. (1970) Nature {London) 

225, 802-806. 

Shotton, D. M., & Watson, H. C. (1970) Nature {London) 
225, 811-816. 

Sottrup- Jensen, L., Claeys, H., Zajdel, M., Petersen, T. E., 
& Magnusson. S. (1978) Prog. Chem, Fibrinolysis 
Thrombolysis J, . 191-209. 

Spiess, M., & Lodish. H. F. (1986) Cell {Cambridge, Mass.) 
44, 177-185. 

Steitz, T. A., Henderson, R., & Blow. D. M. (1969) J, Mol. 

Biol. 46, 337-348. 
Tanaka. K.. Nakaraura, T., & Icbihara. A. (1986) BioL 

Chem, 261, 2610-2615. 
van Driel. 1. R., & Coding, J. W. (1987) J. BioL Chem, 262, 

4882-4887. 

Vieira, J., 8t Messing, J. (1982) Gene 19, 259-268. 
Walsh, K. A., & Neurath, H. (1964) Proc. Natl, Acad, Sci. 

U,S.A, 52, 884-889. 
Watt, K. W. K., Lee, P.-J., M'Timkulu, T.. Chan, W.-P.. & 

Loor, R. (1986) Proc, Natl, Acad, Sci. U.S,A. 83, 

3166-3170. 

Wickner, W. T., & Lodish. H. F. (1985) Science {Washington, 

D.C) 230, 400-407. 
Wiman, B. (1977) Eur. J. Biochem. 76, 129-137. 
Youngv J. D.-E., Leonjg, L. G.; Liu, C.-C, Damiano. A;, WalU— 

D. A., & Cohn, Z. A. (1986) Cell {Cambridge, Mass.) 47, 

183-194. 

Zerial, M.. Melancon. P., Schneider, C, 8l Garoff, H. (1986)— 
EM BO J, 5, 1 543-1 550. 



Sp 



1 





Xrai 




CO pr o 




iron i 




additi 




bound 




Such' 




the pi 




(1986 




knowi 




targe 




f cJcci 




trast-( 




Becau 




probe 




Gd(H 




Th( 




relati\ 




sequel 




with c 




of the 




the si 




specti 




(Bald 




that d 




to the 




specti 




of Gd 




EXPEl 




Mp 




from • 




or pur 




♦Th 




tional . 





Exhibit 20 



The Jol-rnal of Biological Chemistry 

O 1999 by The American Society for Biochemistry and Molecular Biology, Inc. 



Vol. 274, No- 26. Issue of June 25. pp. 18231-18236. 1999 

Printed in U.SA. 



Molecular Cloning of cDNA for Matriptase, a Matrix-degrading 
Serine Protease with Trypsin-like Activity* 

(Received for publication, November 23, 1998, and in revised form, April 8, 1999) 

Chen-Yong Ldn, Joanna Anders, Michael Johnson, Qingxiang Amy San^, and 
Robert B. Dickson § 

From the Lombardi Cancer Center ^ Georgetown University Medical Center, Washington, D. C. 20007 



A m^or protease from human breast cancer cells was 
previously detected by gelatin zymography and pro- 
posed to play a role in breast cancer invasion and me- 
tastasis. To structurally characterize the enzyme, we 
isolated a cDNA encoding the protease. Analysis of the 
cDNA reveals three sequence motifs: a carboxyl-termi- 
nal region with similarity to the trypsin-like serine pro- 
teases, four tandem cysteine-rich repeats homologous to 
the low density lipoprotein receptor, and two copies of 
tandem repeats originally found in the complement sub- 
components Clr and Cls. By comparison with other ser- 
ine proteases, the active-site triad was identified as His- 
484, Asp-539, and Ser-633. The protease contains a 
characteristic Arg-Val-Val-Gly-Gly motif that may serve 
as a proteolytic activation site. The bottom of the sub- 
strate specificity pocket was identified to be Asp-627 by 
comparison with other trypsin-like serine proteases. In 
addition, this protease exhibits trypsin-like activity as 
defined by cleavage of synthetic substrates with Arg or 
Lys as the PI site. Thus, the protease is a mosaic protein 
with broad spectrum cleavage activity and two potential 
regulatory modules. Given its ability to degrade extra- 
cellular matrix and its trypsin-like activity, the name 
matriptase is proposed for the protease. 



Elevated proteolytic activity has been implicated in neoplas- 
tic progi'ession. Although the exact role(s) of proteolytic en- 
zymes in the progression of tumor remains unclear, it seems 
that proteases may be involved in almost every step of the 
development and spread of cancer. A widely proposed wiew is 
that proteases contribute to the degradation of extracellular 
matrix and to tissue remodeling and are necessary for cancer 
invasion and metastasis. A wide array of extracellular matrix- 
degrading proteases have been discovered, the expression of 
some of which correlates with tumor progression, as reviewed 
by Magnatti and Rifkin (1). The plasmin/urokinase-type plas- 



* This work was supported by Specialized Program of Research Ex- 
cellence in Breast Cancer Grant 1P50CA58158 (to R. B. D.) from the 
National Institutes of Health and by the Elsa U. Pardee Foundation (to 
Q. A. S.). Work performed at the Lombardi Cancer Center Macromolec- 
ular Synthesis and Sequencing Shared Resource was supported by 
National Institutes of Health Grant P30-CA51008. Cells and reagents 
were obtained from the Lombardi Cancer Center Tissue Culture Re- 
source, supported by National Institutes of Health Grant P30-CA51005. 
The costs of publication of this article were defrayed in part by the 
payment of page charges. This article must therefore bo hereby marked 
""advertisement" in accordance with 18 U.S.C. Section 1734 solely to 
indicate this fact. 

The nucleotide sequence(s) reported in this paper has been submitted 
to the CenBank'"'^ / EBI Data Bank with accession number ( s) AFl 18224. 

X Present address: Dept. of Chemistry, Florida State University, Tal- 
lahassee, FL 32306. 

§ To whom correspondence and reprint requests should be addressed: 
Lombardi Cancer Center, Georgetown University Medical Center, 
Washington. D. C. 20007. Tel.: 202-687-4304; Fax: 202-687-7505. 



minogen activator system and the 72-kDa gelatinase (MMP-2)/ 
membrane-type MMP system have received the most attention 
for their potential roles in the process of invasion of breast 
cancer and other carcinomas. However, both systems appear to 
be largely synthesized by stromal cells in vivo (2-5) and require 
indirect mechanisms for their recruitment and activation on 
the surfaces of cancer cells. The stromal origins of these well 
characterized extracellular matrix-degrading proteases may 
suggest that cancer invasion is an event that either depends 
entirely upon stromal-epithelial cooperation or is controlled by 
some other unknown epithelium-derived protease(s). A search 
for these epithelium-derived proteolytic systems that may in- 
teract with the plasmin/urokinase-type plasminogen activator 
system and/or with the MMP family could provide a missing 
link in our understanding of malignant invasion. 

We have pursued studies of a novel protease with the hy- 
pothesis that a tumor itself may be a major source of proteases 
important for multiple aspects of malignant behavior, includ- 
ing invasion and metastasis. To this end, we systematically 
altered several conditions such as the pH using gelatin zymog- 
raphy to search for potentially important breast cancer cell- 
derived gelatinases. This search led us to the discovery of a 
major protease, which on a gelatin zymogram had a slightly 
alkaline pH optimum and a size between those of MMP-2 and 
MMP-9 in T-47D human breast cancer cells (6). We now pro- 
pose to call this protease matriptase. Matriptase has been 
purified from T-47D cell-conditioned medium and has been 
used as an immunogen to produce monoclonal antibodies (7). 
Although matriptase was initially isolated from cell-condi- 
tioned medium, three lines of evidence, including immunoflu- 
orescence staining, surface biotinylation, and subcellular frac- 
tionation, suggested that a portion of the enzyme molecules 
were localized on the surfaces of cells. Given its extracellular 
matrix-degrading activity and presentation on the surfaces of 
breast cancer cells, we hypothesize that matriptase may be 
involved in breast cancer invasion. To further characterize the 
newly discovered matrix-degrading protease in this study, we 
have purified the enzyme and its binding protein from human 
milk, a biological source of relatively high abundance. A cDNA 
clone for matriptase has now been generated and characterized. 

MATERIALS AND METHODS 

Cell Lines and Culture Conditions — COS-7 cells were maintained in 
modified Iscove*s minimal essential medium (Biofluids, Inc., Rockville, 
MD) supplemented with 5% fetal calf serum (Life Technologies, Inc.), 

Purification of Matriptase — To obtain enough matriptase for amino 
acid sequencing, the enzyme was isolated from human milk (39). 
Briefly, human milk from the Georgetown University Medical Center 
Milk Bank was precipitated and collected by addition of ammonium 
sulfate between 40 and 60% saturation. Matriptase was purified by a 
combination of CM-Sepharose and immunoaffinity chromatography. 

Amino Acid Sequence Analysis — To obtain internal amino acid se- 
quences, purified matriptase was separated by SDS-polyacrylamide gel 
electrophoresis and lightly stained with Coomassie Blue, and protein 



This paper is available on line at http://www.jbc.org 



18231 



18232 



Matriptase, a Trypsin-like Serine Protease 




B 



1 2 




1 2 3 



95-kDa 



BPS 



Boit 



- + 



- + 



- + 



Fig. 1. Purification of matriptase in its 95-kDa complexed 
form from human milk. The partially purified 95-kDa matriptase 
complex from ion-exchange chromatography was loaded onto a mAb 
21-9-Sepharose column. The bound proteins were eluted by glycine 
buffer, pH 2.4, and neutrahzed by addition of 2 m Trizma. The eluted 
proteins were incubated in ix SDS sample buffer in the absence of 
reducing agents at room temperature {lanes 1\ —Boil) or at 95 {lanes 
2\ +Boil) for 5 min. The samples were resolved by SDS -polyacryl amide 
gel electrophoresis and either stained by colloidal Coomassie (A) or 
subjected to immunoblot analysis using mAb 21-9 iB) or gelatin Kymog- 
raphy (C). The 95-kDa matriptase complex was eluted from this affmity 
column as the major protein (Ay lane 1); it was recognized by mAb 21-9 
(B, lane 1); and it also exhibited gelatinolytic activity (C, lane 2). The 
95-kDa matriptase complex was converted to matriptase by boiling (A, 
lane 2). The gelatinolytic activity of the 95-kDa protease was destroyed 
by boiling, but a low level of the gelatinolytic activity was survived and 
converted to matriptase (C, lane 2), A low level of uncomplexed 
matriptase was copurified with the 95-kDa matriptase complex by 
afiinity chromatography (A, lane 1)\ it also exhibited gelatinolytic ac- 
tivity CC, lane 1). Immunoblot analysis enhanced the signal of the 
uncomplexed matriptase and reconfirmed its existence (B, lane 1). 
Several other polypeptides were also seen (A, lanes 1 and 2). Some of 
them could be the degraded products of the protease since they were 
recognized by mAb 21-9 after longer exposure to the x-ray film. A 
40-kDa protein doublet was seen in low levels in a nonboiled sample (A, 
lane i), but its levels were increased after boiling (A, lane 2), This 
40-kDa doublet was not recognized by mAb 21-9 (S). We propose that 
these two polypeptides could be binding proteins (BPs) of matriptase. 
The sizes of the molecular mass markers are indicated. 



bands were excised. Matriptase was then subjected to in-gel digestion 
and amino acid sequencing at the Howard Hughes Medical Institute 
Biopotymer Laboratory and W. M. Keck Foundation Biotechnology Re- 
source Laboratory at Yale University. The amino-terminal sequences 
were determined as described previously (8). Briefly, the proteins were 
resolved by SDS-polyacrylamide gel electrophoresis, transferred to 
polyvinylidene difluoride membrane, and lightly stained with Coomas- 
sie Blue. The proteins were then excised and subjected to amino-termi- 
nal sequencing in the Chemistry Department of Florida State Univer- 
sity (Tallahassee, FL). The two short sequences obtained were identical 
to a deduced amino acid sequence from a cDNA termed SNC19 (Gen- 
Bank''" accession number U20428). 

Amplification of an SNC19 cDNA from T'47D Breast Cancer 
Cells — An SNC19 cDNA clone was generated by reverse transcriptase- 
polymerase chain reaction utilizing mRNA fi*om T-47D human breast 
cancer cells. Primer sequences for SNC19 (5'-CCTCCTCTTGGTCTT- 
GCTGGGG-3' and 5'-AGACCCGTCTGmTCCAGG-3') were derived 
fi*om the published sequence. Standard reverse transcription -polymer- 
ase chain reaction was conducted using the Advantage RT-PCR kit 
(CLONTECH). Products were analyzed on a 0.8% agarose gel; and the 
resultant band of ^2.8 kilobase pairs, corresponding to the expected 
product size, was excised from the gel, purified, and ligated into pCR2.1 
(Invitrogen, San Diego, OA) by TA cloning (pCR-SNC19). 

Sequencing — DNA sequencing was performed on an Applied Biosys- 




3\ 



Fig. 2. Western blot analysis of SNC19 protein expressed in 
COS cells using anti-matriptase mAb IVI32. The SNC19 fragment 
generated by reverse transcriptase-polymerase chain reaction was in- 
serted into the expression vector pcDNA3.1 and transfected into COS-7 
cells. Cell lysates from SNC19-transfected COS-7 cells {lane 1) and 
control COS-7 cells (lane 2) and the conditioned medium of T-47D 
human breast cancer cells (lane 3) were subjected to Western blot 
analysis using anti-matriptase mAb M32. 



tems automated 377 DNA sequencer using standard methods, with the 
assistance of the Lombardi Cancer Center Sequencing and Synthesis 
Shared Resource. The sequences were assembled and analyzed with 
Lascrgene software for Windows CDNASTAR, Inc., Madison. WI). The 
predicted protein sequence was compared with sequences in the Swiss- 
Prot data base at the National Center for Biotechnology Information 
using the BLAST network server. 

Expression ofSNC19 in COS-? Cells— To verify that SNC19 encodes 
the matriptase cDNA, we constructed a eukaryotic expression vector 
(pcDNA/SNC19) utilizing the commercially available pcDNA3.1 vector 
(Invitrogen, San Diego, CA). A 2.83-kilobase pair EcoRl firagment con- 
taining the SNC19 cDNA was produced by digestion of pCR-SCN19 and 
cloned into the BcoBJ site of pcDNAS.l. This construct contains the 
open reading frame of SNC19 driven by the cytomegalovirus promoter. 
CoiTCct insertion of the SNC19 cDNA was verified by restriction map- 
ping (data not shown). Transfections were carried out using SuperFect 
transfection reagent (QIAGEN Inc., Valencia, CA) as specified in the 
manufacturer's handbook. After 48 h, the matriptase -transfected 
COS-7 cells and the control COS-7 cells, which were transfected with 
LacZ to monitor transfection efficiency, were extracted with 1% Triton 
X-100 in 20 mM Tris-HCl, pH 7.4. 

Immunoblot Analysis — Immunoblotting was conducted as described 
previously (7), Proteins were separated by 10% SDS-polyacrylamide gel 
electrophoresis, transferred to polyvinylidene fluoride membrane, and 
subsequently probed with anti-matriptase mAb^ M32. Immunoreactive 
pol3^eptides were visualized using peroxidase-labeled secondary anti- 
serum and the ECL detection system (Amersham Pharmacia Biotech). 

Gelatin Zymography — Gelatin zymography was carried out as de- 
scribed previously with some modifications (13). Gelatin (1 mg/ml) as a 
substrate was copolymerized with regular SDS-polyacrylamide gel. 
Electrophoresis was performed at a constant current of 15 mA. The 
gelatin gels were washed three times with phosphate-buffered saline 
containing 2% Triton X-100 and incubated in phosphate-buffered saline 
at 37 ''C overnight. 

Cleavage of Synthetic Substrates— To demonstrate the trypsin-like 
activity of matriptase. various synthetic fiuorescent protease substrates 
with arginino or lysine as the PI site were tested with purified 
matriptase from human milk. Matriptase was assayed in 20 mM Tris 
buffer, pH 8.5, at 25 *C in a volume of 190 ft\ prior to addition of 10 /il 
of 2 mM substrate solution (to a final concentration of 0.1 mM). These 
substrates included ^butyloxycarbonyl (Boc)-Gln-Ala-Arg-7-amino-4- 
methylcoumarin (AMC), Boc-benzyl-Glu-Gly-Arg-AMC, Boc-Leu-Gly- 
Arg-AMC, Boc-benzyl -Asp-Pro- Arg- AMC, Boc-Phe-Ser-Arg-AMC, Boc- 
Val-Pro-Arg-AMC, succinyl-Ala-Phe-Lys-AMC, Boc-Leu-Arg-Arg-AMC, 
Boc-Gly-Lys-Arg-AMC, and Boc-Leu-Ser-Thr-Arg-AMC. These sub- 

^ The abbreviations used are: mAb, monoclonal antibody; Boc, t- 
butyloxycarbonyl; AMC, 7-amino-4-methylcoumarin; LDL, low density 
lipoprotein. 



Matriptase, a Trypsin-like Serine Protease 



18233 



Fig. 3. Nucleotide and deduced 
amino acid sequences of a matriptase 
cDNA clone. The primers (20 bases at 
the 5 '-end and 18 bases at the 3 '-end) 
used for reverse transcriptase-polymer- 
ase chain reaction are underlined. Thirty- 
three bases beyond the 5 '-end primer and 
92 bases beyond the 3 '-end primer were 
taken from SNC19 cDNA and incorpo- 
rated. The cDNA sequence was translated 
from the fifth ATG codon in the open 
reading frame. Nucleotide and amino acid 
numbers are shown on the left. Sequences 
that agreed with the internal sequences 
obtained from matriptase are double-un- 
derlined. His-484, Asp-539, and Ser-633 
are boxed and indicated the putative cat- 
alytic triad of matriptase. Potential N- 
glycosylation sites are indicated (A). An 
RGD sequence is indicated (♦). 



357 
270 
IBO 
-90 



9 
3 

16 
6 

27 
9 

36 
12 

i>5 
15 

18 

630 
21 

72 
24 

81 
27 

90 
30 

99 
33 

108 
36 

117 

39 

126 
42 

135 
45 

144 
48 

153 

51 

162 

54 

171 

57 

ISO 

60 

169 

63 

198 

66 

207 
216 
225 
234 
243 
252 



CGCTGGGTGGTGCTGGCAGCCGIGCTGATCGGCCTrrTmnm rTTnrTnnnn ATrRKrTTrrTrnTKTrnrATiTrrArTi^rrrr 

GACGTGCGTGTCCACAAGGTCTTCAAICGCTACATCAGGATCACAAATGAGAATTTTCTGGArGCCTACGAGAACTCCAACTCCACTGAG 
TTTGTAAGCCTGGCCAGCAAGGTCAAGCACSCGCTGAAGCTGCTGTACAGCSGAGTCCCATTCCTGGGCCCCrACCACAAGGAGTCGGCT 
CTGACGGCCTTCAGCGAGGGCAGCGTCATCGCCTACTACTGGTCTGAGTTCAGCATCCCGCAGCACCTGGTGGAGGAGGCCGAGCGCGTC 
ATGGCCGAGGAGCGCGTAGTCATGCTGCCCCCGCCGGCGCGCTCCCTGAAGTCCTTTGTGGTCACCTCAGTGGTGGCTTTCCCCACGGAC 
MAEERVVMLPPRARSLKSFVVTSVVAFPTO 

TCCAAAACAGTACAGAGGACCCAGGACAACAGCTGCAGCTTTGGCCTGCACGCCCGCGGTGTGGAGCTGATGCGCTTCACCACGCCCGGC 
SKTVORTQDNSCSFGLHARGVELMRFTTPG 

TTCCCTGACAGCCCCTACCCCGCTCATGCCCGCTGCCAGTGGGCCCTGCGGGGGGACGCCGACTCAGTGCTGAGCCTCACCTTCCGCAGC 
FPDSPYPAHARCGWALRGDADSVLSLTFRS 

« « * 

TTTGACCTTGCGTCCTGCGACGAGCGCGGCAGCGACCTGGTGACGGTGTACAACACCCTGAGCCCCATGGAGCCCCACGCCCTGGTGCAG 
FOLASCDERGSDLVTVYNTLSPMEPHALVQ 

TTGTGTGGCACCTACCCTCCCTCCTACAACCTGACCTTCCACTCCTCCCAGAACGTCCTGCTCATCACACTGATAACCAACACTGACCGG 
LCGTYPPSYNLTFHSSONVLL ITLl INTER 

A 

CGGCATCCCGGCTTTGAGGCCACCTTCTTCCAGCTGCCTAGGATGAGCAGCTGTCGAGGCCGCTTACGTAAAGCCCAGGGGACATTCAAC 
RHPGFEATFFOLPRMSSCGGRLRKAOGTFN 

AGCCCCTACTACCCAGGCCACTACCCACCCAACATTGACTGCACATGGAACAI I GAGGTGCCCAACAACC AGCATGTGAAGGTGCGCTTC 
SPYYPGHYPPNIDCTWNIEVPNNOHVKVRF 

AAA I ICI ICrACCTGCTGGAGCCCGGCGTGCCTGCGGGCACCTGCCCCAAGGACTACGTGGAGATCAATCGGGAGAAATACTGCGGAGAG 
KFFYLLEPGVPACTCPK DYVC INGEK Y C G E 

AGGTCCCAGTTCGTCGTCACCAGCAACAGCAACAAGATCACAGTTCGCTTCCACTCAGATCAGTCCTACACCGACACCGGCTTCTTAGCT 
RSOFVVTSNSNK I TVRFHSOQSYTOTGFLA 

GAATACCTCTCCTACGACTCCAGTGACCCATGCCCGGGGCAGTTCACGTGCCGCACGGGGCGGTGTATCCGGAAGGAGCTGCGCTGTGAT 
EYLS YOSSDPCPGOFTCRTGRCIRKELRCD 

GGCTGGGCCGACTGCACCGACCACACCGATGAGCTCAACTGCAGTTGCGACGCCGGCCACCAGTTCACGTGCAA6AACAAGTTCTGCAAG 
GWADCTDHSDELNCSCDAGHOFTCKNKFCtC 

A 

CCCCTCTTCTGGGTCTGCGACAGTGTGAACGACTGCGGAGACAACAGCGACGAGCAGGGGTGCAGTTGTCCGGCCCAGACCTTCAGGTGT 
PLFWVCDSVKDCGDNSDEOGCSCPAOTFRC 

TCCAATGGGAAGTGCCTCTCGAAAAGCCAGCAGTGCAATGGGAAGGACGACTGTGGGGACGGGTCCGACGAGGCCTCCTGCCCCAAGGTG 
SNGkCLSKSOOCNGKDOCGDGSDEASCPKV 

AACGTCGTCACTTGTACCAAACACACCTACCGCTGCCTCAATGGGCTCTGCTTGAGCAAGGGCAACCCTGAGTGTGACGGGAAGGAGGAC 
NVVTCTKHTYRCLNGLCLSKGNPECDCKED 

TGTAGCGACGGCTCAGATGAGAAGGACTGCGACTGTGGGCTGCGGTCATTCACGAGACAGGCTCGTGTTGTTGGGGGCACGGATGCCCAT 
CSOGSDEKDCDCGLRSFTROAR VVGGTDAD 

GAGGGCGAGTGGCCCTGGCAGGTAAGCCTGCATGCTCTCGGCCAGGGCCACATCTGCGGTGCTTCCCTCATCTCTCCCAACTGGCTGGTC 
E G E WPWOVSLHALGOGH I CGASL 1 SPNWLV 

TCTGCCGCACACTGCTACATCGATGACAGAGGATTCAGGTACTCAGACCCCACGCAGTGGACGGCCTTCCTGGGCTTGCACGACCAGAGC 
saa[h]cy IDDRGFRYSDPTOWTAFLGLHDOS 

cagcgcagcgcccctggggtgcaggagcgcaggctcaagcgcatcatctcccaccccttcttcaatgacttcaccttcgactatgacatc 
orsapgvqerrlkri ishpffnoftfdy[o]i 

gcgctgctggagctggagaaaccggcagagtacagciccatggtgcggcccatctgcctgccggacgcctcccatgtcttccctgccggc 

ALLELEKPAEYSSHVRP 1 CLPDASHVFPAG 

AAGGCCATCTGGGTCACGGGCTGGGGACACACCCAGTATGGAGGCACTGGCGCGCTGA7CCTGCAAAAGGGTGAGATCCGCGTCATCAAC 
KAIWV TGWGHTOYGGTGAL ILOKGE IRV IN 

CAGACCACCTGCGAGAACCTCCTGCCCCAGCAGATCACGCCGCGCATGATGTGCGTGGGCTTCCTCAGCGGCGGCGTGGACTCCTGCCAG 
OTTCENLLPOOI TPRflMCVGFLSGGVDSCO 

GGTGATTCCGGGGGACCCCTGTCCAGCGTGGAGGCGGATGGGCGGATCTTCCAGGCCGGTGTGGTGAGCTGGGGAGACGGCTGCGCTCAG 
GOUIJgGPLSSVEAOGR i FOAGVVSWGDGCAQ 

AGGAACAAGCCAGGCGTGTACACAAGGCTCCCTCTGTTTCGGGACTGGATCAAAGAGAACACTGGGGTATAGGGGCCGGGGCCACCCAAA 
RNKPGVYTRLPLFROWIKENTGV'" 

TGTGTACACCTGCGGGGCCACCCATCGTCCACCCCAGTGTGCACGCCTGCAGGCTGGAGACTGGACCGCTGALIGCACCAGCGCCCCCAG 
AACATACACTGTGAACTCAATCTCCAGGGCTCCAAATCTGCCTAGAAAACCTCTCGCTTCCTCAGCCTCCAAAGTGGAGCTGGGAGGTAG 
AAGGGGAGGACAC7GGTGGTTCTACTGACCCAACTGGGGGCAAAGGTTTGAAGACACAGCCTCCCCCGCCAGCCCCAAGCTGGGCCGAGG 
CGCGTTTGTGTATATCTGCCTCCCCTGTCTGTAAGGAGCAGCGGGAACGGAGCTTCGGAGCCTCCTCAGTGAAGGTGGTGGGGCTGCCGG 
*TCTCCCrTCTC??'^C'"^'^TrnrrrArr'rTrTTrArrAAnrrrAnnrTrf:r:Af:nArrrTrnAA&ArAnArnr:nTrTnAnArTnAAAATGr, 
TTTACCAGCTCCCAGGTGACTTCAGTCrGTGTATTGTGTAAATGAGTAAAACATTTTATTTCTTTTTAAAAAAAAAAA 



strates were purchased from Sigma. The rate of cleavage of individual 
substrate was determined against time with a Hitachi F-4500 fluores- 
cence spectrophotometer. 

RESULTS AND DISCUSSION 

Purification of Matriptase from Human Milk — In our previ- 
ous study (7), a small proportion of the matriptase molecules 
were identified as complexes in human breast cancer cells. We 
have subsequently found human milk to be a good source for 
isolation of larger quantities of the matriptase complexes (39). 
We first purified from human milk a matriptase complex with 
an apparent size of 95 kDa using anti-matriptase nxAb 21-9- 
Sepharose affinity chromatography (Fig. lA). The 95-kDa com- 
plex is capable of being converted by boiling to matriptase plus 
a 40-kDa protein doublet. Both the 95-kDa complex and 



matriptase itself were recognized by anti-matriptase raAb 21-9 
(Fig. IB). Although sequence analysis of the 40-kDa binding 
protein has shown it to be a serine protease inhibitor (see 
below), some residual gelatinol3^ic activity was observed for 
the 95-kDa matriptase-inhibitor complex (Fig. IC). When 
matriptase and its binding protein were subjected to N-termi- 
nal sequencing, only 11 amino acid residues (WGGT- 
DADEGE) from matriptase were obtained, with relatively low 
recovery. In addition, 12 amino acid residues (GPPPAPPGL- 
PAG) were obtained from the amino terminus of the 40-kDa 
binding protein. We searched GenBank™ using these amino 
acid sequences for proteins related or corresponding to 
matriptase and its binding protein. The binding protein of 
matriptase was identified to be a Kunitz-type serine protease 



18234 



Matriptase, a Trypsin-like Serine Protease 



Matrtptase 
E n t e rok i n a se 
TMPRSS2 
Sb- sbd 
Heps i n 
Factor XI 
Plasminogen 
T r y p s i n 
C hymo t ry p s i n 



Matriptase 
Enterok 1 nose 
TMPRSS2 
Sb- sbd 
Heps I n 
Factor XI 
P I asm i nogen 
T r y p s in 
C h ymoi r y ps t n 



Ma t r i p t as e 
Enterok i nase 
TMPRSS2 
Sb- sbd 
Heps in 
Factor XI 
P I asm i nogen 
T r y p s i n 
Chymotryps i n 



DC 
SC 
AC 
EC 
DC 
EC 
DC 
V- 
I - 



L-RSFTR 
K -KLAAQ 
V -NLNSS 
VPTL A- - 
R -R - - KL 

T -K 

K PQ VEPK 

AAPF 

HPVL 



R YSDPT 
RNLEPS 
PLNNPW 
0 I R I R V 
RNR VLS 

PK 

RPS 

KSR I QV 
RTSDV V 



P I 
P I 
PV 
P I 
P V 

P 1 

PAI 
A I 
AV 



LPD 
LPE 
LPN 
LPE 
L PA 
LPS 
LPS 
LPT 
LPS 



QWT A 
KWT A 
HWT A 
GE YD 
RWR V 
I LR V 
SYK V 
RLGE 
VAGE 



ASH V 
ENQV 
PGHM 
TDSL 
AGOA 
KGOR 
PNY V 
APP - 
ADOD 



0- -A 
D 1 TP 
ROS- 
RPET 
PVD- 
- I KP 
K CPG 
ODDD 
SGLS 



^ C I e gygqe Site 



RV VGGT 
K I VGGS 
R 1 VGGE 
R I VGGK 
R I VGG R 
R I VGG T 
RV VGGC 
K I VGG Y 
R I VNGE 



FL6LHD0SQR 
I LGLHMKSNL 
FAG ! LROSFM 
F SHVQEQLP Y 
F AGA VAGASP 
YSG I LNOSE I 
I LGAHQE VNL 

H N I EV 

F DOGS 



FPA 
FPP 
LOP 
L I - 
LVD 
N V I 
V AD 
- AA 
FPA 



GK A 
GRN 
EQL 
GMN 
GK I 
YTD 
RTE 
GTE 
GTL 



I WVT 
CS I A 
CW I S 
AT VT 
C T VT 
CWVT 
CF I T 
SL 1 S 
C ATT 



OADEGEWPWQ 
NAKEGAWPWV 
SALPGAWPWQ 
SAAFGRWPWO 
DTSLGRWFWQ 
AS VRGEWPWO 
V AHPHSWPWQ 
I CEENSVPYO 
DA VPGSWPWO 



SAPGVOERRL 
TSPOT VPRL I 
F - - YGAGYQV 

!ERG V 

HGLOLGV 

KE -DTSFFG V 

-EPH V 

LEGNEQF 1 NA 
DEEN I QVLK I 



GWGHT 
GWGTV 
GWGAT 
GWGRL 
GWGNT 
GWG YR 
GWGET 
GWGNT 
GWGKT 



OYGGT 
VYQGT 
EEK GK 
S -EGG 
OY YGO 
KLRDK 
0-GTF 
LSSGA 
K YNAN 



VSLHALG- -□ 
VGLY-YG- -G 
VSLHVONV - - 
VSVRRTSFFG 
VSLR YOGA- - 
VTLHTTSPTO 
VSLRT - -RFG 
VSLN - - SGY - 
VSLQDK TGF - 



GH I 

RLL 

H 

FSSTH 

H 

RH 

MH 

H 

HF 



VC 
RC 
LCG 
LCG 
FC 
FC 



GAS 
GAS 
GGS 
GGA 
G5 
GS 
GGT 
GGS 
OGGS 



KR I I SHPFF - NDFTF 

DE I V 1 NPHY -NRRRK 

QKV I SHPNY DSKTK 

AKKVVHPKY- SF LTY 

QAVVYHGGYL PFRDPN5EEN 

QEI 1 IHOQY- KMAES 

QEIEVSRLFL EPTRK-- 

AK I I RHPKY - NSRTL 

AKVFKNPKF- SILTV 



G- AL I LO 
T- AN 1 LO 
T-SEVLN 
TLPS VLQ 
Q- AGVLO 
I -ONTLO 
G- AGLLK 
DYPDELQ 
KTPDKLQ 



KGE IRVINQTTfC 

EAD VPLLSNER 

AAK VLLIETQR 

EVS VPIVSNDN 

EAR VP I [ SND V 

KAK IPLVTNEE 

EAQ LPVIENKV 

CLD APVLSQAE 

OA A LPLLSNAE^ 



CN 
ICE 



L I SP 
L VSS 
I I TP 
L I NE 
LLSG 
I I GN 
L I SP 
L I SE 
L 1 SE 



NWL 
DWL 
EW I 
NW I 
DWV 
OW I 
EWV 
OWV 
DWV 



VSA 
VSA 
VTA 
ATA 
LT A 
LT A 
LTA 
VSA 
VTA 




DN 
NN 



I ALLELE 
! AMMHLE 
I ALMKLQ 
|_AL VKLE 
I AL VHLS 
i ALLKLE 
I ALLKLS 
ILL IKLS 
I TLLKLA 



AH 
AH 



H 
H 
H 
AlH 
AH 



3. 

Y I DDRGF 

V YG 

VEK 

CIVDDLL I S 

PE 

:FYGVES- 

:1leksp-- 

Y 

GV 



KPA 
FK V 
KPL 
QPL 
SPL 
TT V 
SPA 
SPA 
TP A 



EYS 
NYT 
TFN 
EFA 
PLT 
NYT 

V I T 

V I N 
RFS 



SMVR 
DY I 0 
DL VK 
PHVS 
EY I Q 
DSQR 
DK V I 
SRVS 
QT VS 



NLLP-0-0- - 
DOMP - E YN - - 
SR Y V YDNL - - 
SMFMR AGROE 
GADFYGNQ- - 
K R YR - GHK - - 
R YEFLNGR- - 
A- -SYPGK - - 
K - - SWGRR - - 



1 TPRM 
1 TENM I 
I T P AM I 
I PD I F 
1 KPKM 
] THKM I 
VQSTE 
I TNNMF 
I TO VM I 



M&VG 
C AG 

cIag 

G 
G 

ClAG 
AG 
C VG 
C AG 



LC A 
FC A 



LC 



Matr Iptase 
E n t e rok I n a s e 
TMPRSS2 
Sb-sbd 
H ep s i n 
Factor XI 
Plasminogen 
T r y p s i n 
Chymotryps I n 



FLS 
YEE 
FLO 
YET 
YPE 
YRE 
HLA 
FLE 
- - A 



GGVp 
GG I DS 
GNV 3 S 
GGO 3 5 
GG I 3 A 
GGK 3A 
GGT 3 5 
GGK 35 
SGVS5 



Q GDEI 



CO 
Q 



C 
CIK 



G 
G 
G 
G 
G 
G 

GD 
GD 
GD 



GGPLSSV 
DlSGGPLMCQ 
DS GGPL V - - 

DSGGPL 

DSGGPF VCE 
□bsGPLSCK 
GGPL VCF 



EADGR IF 

ENN -R WF 

- - TSNNN I WW 
OAKSODGRFF 
DS I SRTPRWR 

HNE V WH 

EKD KYI 



SGPVVSN GE 

SGPLVCQ KDGA-- 



WT 



OAGV 
LAGV 
L I GD 
LAG I 
LCG I 
LVG I 
LOG V 
LOG I 
LVG I 



VSW 
TS 
TSW 
IS 
VSW 
TS 
TS 
VS 
VSW 



F 3 



W 3 I 



WG 

ws 
wGhr 



3G 
YK 
SG 
G 
TG 
EG 
_G 
G 
SO 



Haq 

CAL 
C AK 
CAE 
CAL 
CAD 
CAR 



AO 

ST 



RNKP 
PNRP 
AYRP 
ANLP 
AOKP 
RERP 
PNKP 
KNRP 
SS -P 



V Y 

V Y 

V Y 
VC 

V Y 

V Y 

V Y 

V Y 

V Y 



TRLPLFRDWI KENTGV 



ARVSRFTEW I 
GNVM VFTDW I 
TR I SKFTPW I 
TK VSDFREW I 
TNVVE Y VOW I 
VR VSRF VTW 1 
TK VYNY VDW I 



-QSFLH 
YRQMK ANG 
LEHVR 

FGA I KTHSEA 
LEKTQAV 
EG VMRNN 
KDT 1 AANS 



ARVTKLIPWV QKILAAN 



Fig. 4. Comparison of the amino acid sequence of the C-terminal region of matriptase with trypsin, chymotrypsin, and the 
catalytic domains of other serine proteases. The C-terminal region (amino acids 431-683) of matriptase is compared with human trypsin (21); 
human chymotrypsin (22); the catalytic chains of human enteropeptidase (16), human hepsin (17), human blood coagulation factor XI (19), and 
human plasminogen; and the serine protease domains of two transmembrane serine proteases, human TMPRSS2 (32) and the Drosophila 
Stubble-stubbloid gene (Sb-sbd) (33). Gaps to maximize homologies are indicated by dashes. Residues in the catalytic triads (matriptase His-484, 
Asp-539, and Ser-633) are boxed and indicated (A). The conserved activation motif ((R/K)VIGG) is boxed, and the proteolytic activation site is 
indicated. Eight conserved cysteines needed to form four intramolecular disulfide bonds are boxed, and the likely pairings are as follows: 
Cys-469-Cys-485, Cys-604-Cys-618, Cys-629-Cys-658, and Cys-432-Cys-559. The disulfide bond Cys-432-Cys-559 is observed in two-chain 
serine proteases, but not in trypsin and chymotrypsin. Residues in the substrate pocket (Asp-627, Gly-655, and Gly-665) are boxed and indicated 
(*). It is evident that the residue positioned at the bottom of the substrate pocket is Asp in trypsin-like proteases, including matriptase, but Ser 
in chymotrypsin. 



inhibitor. This inhibitor is known to be a reversible and com- 
petitive serine protease inhibitor that was reported to inhibit 
the hepatocyte growth factor activator; thus, it was named HAI 
(9). The detailed characterization of HAI from the matriptase 
complex is reported in the accompanying paper (39). The 11 
amino acid residues from matriptase were identical to a de- 
duced amino acid sequence from a 2.9-kilobase pair cDNA 
called SNC19. We subsequently obtained nine internal amino 
acid residues (DYVEINGEK) from matriptase. These were also 
identical to the predicted translated protein sequences of 
SNC19. However, numerous stop codons were observed in this 
deposited SNC19 sequence, resulting in several small predicted 
translation products. Thus, a 2830-base pair cDNA fragment 
was obtained by reverse transcriptase-polymerase chain reac- 
tion using two primers based on the sequence of SNC19. We 
observed extensive discrepancy (132 bases) between our se- 
quence and that of SNC19. These analyses suggest that there 
might be some errors in the bank-deposited SNC19 sequences 
or that this cDNA encodes a distinct but related protein(s). 
Verification of SNC19 cDNA Encoding Matriptase — In addi- 



tion to the sequence identity of matriptase to a portion of 
SNC19, we examined the immunoreactivity of anti-matriptase 
mAbs to the SNC19 to verify whether SNC19 encodes 
matriptase. SNC19 cDNA was inserted into the eukaryotic 
expression vector pcDNAS.l and transfected into COS-7 mon- 
key kidney fibroblasts, which do not express matriptase. An 
immunoreactive band with the same size of matriptase from 
T-47D human breast cancer cells (Fig. 2, lane 3) was detected 
by anti-raatriptase mAb M32 in SNC 19- transfected COS-7 cells 
{lane i), but not in control COS-7 cells {lane 2). These results, 
when combined with the internal amino acid sequences from 
matriptase demonstrating identity to the deduced amino acid 
sequences of SNC19, suggest that SNC19 encodes matriptase. 

Nucleotide and Predicted Amino Acid Sequences of a 
Matriptase cDNA Clone — The nucleotide and amino acid se- 
quences of SNC 19 are shown in Fig. 3. Matriptase cDNA is 
likely to be 2955 base pairs long when the 5 '-end 33 bases and 
the 3 '-end 92 bases from SNC 19 are added to the reverse 
transcriptase-polymerase chain reaction fragment (2830 base 
pairs long). The translation initiation site was assigned to the 



Matriptase, a Trypsin-like Serine Protease 



18235 



A LDL-receptor type regions 



Fig. 5. Alignment of partial se> 
quences of the noncatalytic domain 
with those of homologous regions in 
other proteins. A, the cysteine-rich re- 
peats of matriptase (amino acids 280- 
314, 315-351, 352-387, and 394-430) are 
compared with the consensus sequences 
of the human LDL receptor (23), LDL re- 
ceptor-related protein (LRP) (24), human 
perlecan (34), and rat GP-300 (35). The 
consensus sequences are boxed. B, Clr/s- 
type sequences of matriptase (Mt\ amino 
acids 42—155 and 168-268) are compared 
with selected domains of human comple- 
ment subcomponent Clr (amino acids 
193-298) (25, 26), Cls (amino acids 175- 
283) (27, 28), Ra-reactive factor {RaRF) 
(amino acids 185-290) (36, 37), and a cal- 
ciumdependent serine protease (CSP) 
(amino acids 181-289) (38). The consen- 
sus sequences are boxed. 



Matriptase {280-31^1) 
{315-351) 
{352-3871 

Consensus sequences 
LDL - receptor 
LRP 

Per I ecan 
GP-330 



Q Clr/s type region 



P 

s 
s 

T 



PG - - 0 
DAGHO 
PA-QT 
TK - H T 



TfclR TGRfC 



f 

FfTiC 

EJRC 

YRC 



EF 



EF 



TG 
KNKF 
5NGK 
LNG 



L C 



RC 



I RK ELR - 
ClKPLFWV - 
L SK SQ - 
LSKGNP 




I • 

I P 

■ • 

I • 



W • - 
W • - 



COIGW 

cgs VN 

NGKO 

cdIgke 



AD 



CD 
CD 
CD 
CD 



C 
DC 
D 
DC 



T D 
G D 
dG D 



DC 
DC 
DC 
DC 



H 
N 
G 
SlDG 



SDE 
SDE 
SDE 
SDE 



SDE 
SDE 
SDE 
SDE 



L^ C 
OG C 
ASC 
KD C 



Mt t 1) 
Mt (2) 
Clr (2) 
Cls (2) 
RaRF (2) 
CSP (21 



Mt ( 1) 
Mt (2) 
Clr (21 
Cls (2) 
RaRF (2) 
CSP (21 



42 
168 
193 
175 
185 
181 



107 
226 
251 
235 
2^13 
241 




GFinDSP|7P|AHAR|C 
-h YP^NI 



YPG 



DC 



ClFGLHARGVELMRFT 
C|GGRLRKAO-|aT--FI^ 
C3SELYTEASGY-- I 
CSGDVFTALIG£--1A|S 
C S DNL F T OR T G V - - I T SF=|DF|FiN - p|YPk SSEt 
CaGDVFTALIEE-- lAEBN 



-syp^dlrc^ysi 
y]f!k-p|yp|ensrIc :yqi 

-YTI 



QWALRGDADSVLSLTFRS--|FDiLASCDERGSOLtfT 
TWh(llEVPNNOHVKVRF-KFFYL ™ 

rvergltlhlkfl-epfT 

RLEKCFOVVVTLRREOFI 
ELEEGFMVNLQFE-DI 



CVEAADSAGN 

FD lED-HPEVP C 

PlYPlENSRlClEYOlllRLEKGFQVVVTLRREaFnlvFAADSAGN 



LEPGVPAGT--- 
IDD-HOOVH--- 



VYNTLS-PMEPHALVOLCqTYFlFlS 

PrfDlYVE INGEK YC( 

PYDQLOI YANGKNIGEFCI 
L-DSLVFVAGDRQFGPYCI 
PYDYIK IKVGPKVLGPFC 
Q -OSL L F A AK NRQF G P FOSNG FlSG 



Yl 
ER 
K( 
Gl 

EK- 



GH 

5' 



QRFP 
FFG 
APE 



YNLTFHSS 
S-QFVVTSNiS 
-DLD--TSS 
-PLNIETK 
-PIS--T 
-PLTIETli 



LLITLITNTERRHPfaP 155 
ITVF^HSfaOSYTDTGF 268 
VDLLFFTCESGDSRGW 298 
LDi IFOTDLTGQKKGW 283 
HSVLILFHSDNSGENRGW 290 
NhLDiVEQ-naLTEOKKlGlw 289 



r^ttative 
Signal peptide 



NH2 



LDL receptor domain 
\ I M 111 IV / 









Serine protease domain 



COOH 



fl 

C 1 r/s domain 

Fig. 6. Domain structure of matriptase. A schematic representa- 
tion of the structure of matriptase is presented. The protease consists of 
683 amino acids, and the protein product has a calculated mass of 
75,626 Da. The protease contains two tandem complement subcompo- 
nent Clr and Cls domains and four tandem LDL receptor domains. The 
serine protease domain is at the carboxyl terminus. 



fifth methionine codon because the sequence GTCATGG matches 
a favorable Kozak consensus sequence (10). This methionine is 
followed by four positively charged amino acids and a 14-amino 
acid hydrophobic region (Ser-18-Ser-31), a putative signal pep- 
tide. Assuming this methionine codon to be the initiator, the open 
reading frame was 2049 base pairs long, and thus, the deduced 
amino acid sequence was composed of 683 residues with a calcu- 
lated molecular mass of 75,626 Da. The two stretches of amino 
acid sequences (DYVEINGEK and WGGTDADEGE) obtained 
from matriptase are located in amino acids 228—236 and 443- 
453; thus, the translation frame is likely to be correct. There are 
three potential N-glycosylation sites with the canonical Asn-X- 
(Ser/Thr) sequence and an RGD sequence. An RGD sequence 
from proteins of the extracellular matrix has been found to me- 
diate their interactions with integrins (11), 

Structure of the Matriptase Catalytic Domain — A homology 
search for the deduced amino acid sequence by BLAST in the 
Swiss-Prot data base revealed that the carboxyl terminus at 
residues 432—683 of matriptase is homologous to other serine 
proteases and that matriptase contains the invariant catalytic 
triad, a characteristic disulfide bond pattern, and overall se- 
quence similarity. Compared with the archetype serine prote- 
ase chymotrypsin (12, 13) and other serine proteases, the three 
amino acids (His-484, Asp-539, and Ser-633) are likely to cor- 
respond to those in chymotrypsinogen (His-57, Asp-102, and 
Ser-195) and are likely to be essential for catalytic activity (14). 
The six most conserved cysteines needed to form three intramo- 



lecular disulfide bonds that stabilize the catalytic pocket have 
been determined in other chymotrypsin-related proteases. The 
most likely cysteine pairings in matriptase are thus as follows: 
Cys-469-Cys-485, Cys-604-Cys-618, and Cys-629-Cys-658). 
Matriptase also contains two additional cysteines (Cys-432- 
Cys-559) that correspond to those used in two-chain proteases, 
such as enteropeptidase (15, 16), hepsin (17), plasma kallikrein 
(18), blood coagulation factor XI (19), and plasminogen (20), but 
not in trypsin (21) or chymotrypsin (22) (Fig. 4). 

A putative proteolytic activation site (Arg-442) of matriptase 
in an Arg-Val-Val-Gly-Gly motif is similar to the characteristic 
RIVGG motif in other serine proteases. As mentioned above, a 
conserved intramolecular disulfide bond is found in those ser- 
ine proteases that are synthesized as single-chain zymogens 
and are proteoljrtically activated to become active two-chain 
forms. This disulfide bond is proposed to hold together the 
active catalytic fragment with their noncatalytic N-terminal 
fragments. This conserved intramolecular disulfide bond has 
been also observed in matriptase (Cys-432-Cys-559). These 
sequence analyses suggest that matriptase may be synthesized 
as a single-chain zymogen and may become proteolytically ac- 
tivated to a two-chain form. If this is the case, the majority of 
matriptase molecules in the conditioned medium of T-47D 
breast cancer cells are likely to be in the zymogen form; the 
two-chain matriptase represents only a minor proportion of the 
total, consistent with the purified matriptase from T-47D hu- 
man breast cancer cells exhibiting an apparent size of 80 kDa 
under reduced conditions (data not shown). This conclusion is 
also supported by the observation that the proposed N-terminal 
sequences for the catalytic chain of matriptase are identical to 
the stretch of amino acid residues (WGGTDADEGE) that were 
obtained from milk-derived matriptase with very low recovery 
when matriptase was subjected to N-terminal sequencing. 

The substrate specificity (Sj) pocket of matriptase is likely to 
be composed of Asp-627, positioned at its bottom, with Gly-655 
and Gly-665 at its neck, indicating that matriptase is a typical 
trypsin-like serine protease. The predicted preferential cleav- 
age for matriptase at amino acid residues with positively 
charged side chains was tested with 10 synthetic substrates 
with Arg and Lys residues as PI sites. In our preliminary 



18236 



Matriptase, a Trypsin-like Serine Protease 



studies (data not shown), matriptase was able to cleave the 
following synthetic substrates, presented as follows from the 
most rapid to the slowest: Boc-Gln-Ala-Arg-AMC, Boc-benzyl- 
Glu-Gly-Arg-AMC, Boc-Leu-Gly-Arg-AMC, Boc-benzyl-Asp- 
Pro-Arg-AMC, Boc-Phe-Ser-Arg-AMC, Boc-Val-Pro-Arg-AMC, 
succinyl-Ala-Phe-Lys-AMC, Boc-Leu-Arg-Arg-AMC, Boc-Gly- 
Lys-Arg-AMC, and Boc-Leu-Ser-Thr-Arg-AMC. Thus, matrip- 
tase may prefer substrates with amino acid residues containing 
small side chains, such as Ala and Gly, as P2 sites. 

Structure Motifs of the Noncatalytic Region of Matriptase — 
The noncatalytic region of matriptase contains two sets of 
repeating sequences, which may serve as regulatory and/or 
binding domains for interactions with other proteins. Four 
tandem repeats of -^35 amino acids including six conserved 
cysteine residues (Fig. &A) were found at the amino-terminal 
region (amino acids 280-430) of its serine protease domain. 
They are homologous to the cysteine-containing repeat of the 
LDL receptor (23) and related proteins (24). All of these cys- 
teine residues are likely be involved in disulfide bonds. In the 
LDL receptor, the homologous seven repeating sequences serve 
as the ligand-binding domain. By analogy, the four tandem 
cysteine-containing repeats in matriptase may also be the sites 
of interaction with other macromolecules. In addition, the cys- 
teine-containing LDL receptor domain was found in other pro- 
teases such as enteropeptidase (15, 16). 

The amino-terminal region of matriptase (amino acids 42— 
268) contains another two tandem segments with internal ho- 
mology. These segments resemble partial sequences, originally 
identified in complement subcomponents Clr (25, 26) and Cls 
(27, 28). This Clr/s domain was also found in other serine 
proteases, such as enteropeptidase, an activator of trypsin ogen 
(15, 16), and in the astacin subfamily of zinc metalloprotease, 
such as bone morphogenetic protein- 1 (29) and Drosophila 
tolloid gene, a dorsal- ventral patterning protein (30). Although 
the exact roles of the Clr/s domains in these proteins remain 
unclear, a deletion of the first Clr/s domain in complement 
subcomponent Clr impairs tetramer formation of Clr with Cls 
(31). These results suggest that this domain may be involved in 
protein-protein interactions. In our previous study (7), a small 
proportion of the matriptase in breast cancer cells was identi- 
fied in its complexes. One of the complexes has been isolated 
from human milk, and the binding protein was identified as a 
firagment of a Kunitz-type serine protease inhibitor. Whether 
the LDL receptor domain and the Clr/s domain in matriptase 
are both involved in the interaction with the Kunitz-type serine 
protease inhibitor remains to be investigated. 

In conclusion, matriptase is a trypsin-like serine protease 
with several potential regulatory modules (Fig. 6). Its broad 
spectrum cleavage activity may contribute to the degradation 
of the extracellular matrix, activation of other proteases, and 
processing of growth factors. All of these ascribed functions 
could contribute to important aspects of tumor progression 
such as cancer invasion and to physiological process such as 
differentiation and lactation. The presence of potential protein- 
protein interaction domains and ligand-binding domains in 
matriptase suggests that the interaction of matriptase with 
other macromolecules on the cell surface (such as the luminal 
surface of the mammary gland) may regulate its activation, 
inhibition, and presentation. Aberrant regulation of matriptase 
processing may be involved in the malignant progression of 
cancers. 



Acknowledgments — We thank Dr. Henry Yang for the automated 
DNA sequencing that was performed at the Lombardi Cancer Center 
Macromolecular Synthesis and Sequencing Shared Resource. We thsmk 
the Lombardi Cancer Center Tissue Culture Resource for cells and 
reagents. 

REFERENCES 

1. Mignatti, P., and Rifltin, D. B. (1993) Physiol. Reu. 73, 161-195 

2. Nielsen. B. S., Sehested, M.. Timshel. S., Pyke, C, and Dano, K. (1996) Lab, 

J n vest. 74, 168-177 

3. Pyke, C, Graem, N., Ralfkiaer, E., Ronne, E., Hoyer-Hansen, G., Brunner, N., 

and Dano, K. (1993) Cancer Res. 53, 1911-1915 

4. Polette, M., Gilbert, N., Staa, I., Nawrocki, B., Noel, A., Remade, A., Stetler- 

Stevenson, W. G., Bircmbaut, P., and Foidart, M. (1994) Virchows Arch. 
424, 641-645 

5. Okada, A., Bellocq, J. P., Rouyer, N., Chenard, M. P., Rio, M. 0., Chambon, P., 

and Basset, P. (1995) Proc. Natl. Acad. Sci. V. S. A. 92, 2730-2734 

6. Shi, Y. E., Torri, J., Yieh, L., Wellstein, A., Lippman, M. E., and Dickson, R. B. 

(1993) Cancer Res. 53, 1409-1415 

7. Lin, C.-Y., Wang, J. K., Torri, J., Dou, L., Sang, Q. A., and Dickson, R. B. (1997) 

J. Biol. Chem. 272, 9147-9152 

8. Matsudaira, P. (1987) J. Biol. Chem. 262, 10035-10038 

9. Shimomura, T., Denda, K., Kitamura, A., Kawaguchi, T., Kito, M., Kondo, J., 

Kagaya, S., Qin, L., Takata. H., Miyazawa. K., and Kitamura, N. (1997) 
J. Biol. Chem. 272, 6370-6376 

10. Kozak, M. (1984) Nucleic Acids Res. 12, 857-872 

11. Ruoslahii, E., and Pierschbacher, M. D. (1987) Science 238, 491-497 

12. Hartley, B. S., and Kauffinan, O. L. (1966) Biochem. J. 101, 229-231 

13. Brown, J. R.. and Hartley, B. S. (1966) Biochem. J. 101, 214-228 

14. Hartley, B. S., Brown, J. R., KaufFman. D. 1... and SmiHie. L. B. (1965) Nature 

207, 1157-1159 

15. Matsushimo, M., Ichinose, M., Yahagi, N., Kakei, N., Tsukada, S., Miki, K., 

Kurokawa, K., Tashiro. K.. Shiokawa, K., and Shinomiya, K. (1994) J. Bio!. 
Chem. 269, 19976-19982 

16. Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D. W., and Sadler, J. E. (1994) Proc. 

Natl. Acad. Sci. U. S. A. 91, 7588-7592 

17. Leytus, S. P., Loeb, K. R., Hagen, F. S., Karachi, K., and Davie, E. W. (1988) 

Biochemistry 27, 1067-1074 

18. Chung, D. W., Fujikawa, K., McMuUen, B, A., and Davie, E. W. (1986) Bio- 

chemistry 25, 2410-2417 

19. Fujikawa. K., Chung, D. W., Hendrickson, L. E., and Davie, E. W. (1986) 

Biochemistry 25, 2417-2424 

20. Forsgren, M., Raden, B., Israelsson, M., Larsson, K., and Heden, L. O. (1987) 

FEBS Lett. 213, 254-260 

21. Emi, M., Nakamura, Y., Ogawa, M., Yamamoto, T., Nishide, T., Mori, T., and 

Matsubara, K. (1986) Gene LAmst. ) 41, 305-310 

22. Tomita, N., Izumoto. Y., Horii, A., Doi, S.. Yokouchi, H., Ogawa, M., Mori, T., 

and Matsubara, K. (1989) Biochem. Biophys. Res. Commun. 158, 569-575 

23. Sudhof, T. C, Oildstein, J. L., Brown. M. S.. and Russell, D. W. (1985) Science 

228, 815-822 

24. Herz, J., Hamann, U., Rogne, $., Myklebost, O., Gausepohl, H., and Stanley, 

K. K. (1988) EMBO J. 7, 41 19-4127 

25. Leytus, S. P., Kurachi, K., Sakariassen, K. S.. and Davie, E. W. (1986) Bio- 

chemistry 25, 4855-4863 

26. Joumet, A., and Tosi, M. (1986) Biochem. J. 240, 783-787 

27. Mackinnon, C. M., Carter, P. E., Smyth, S. J., Dunbar, B., and Fothergill, J. E. 

(1987) Eur. J. Biochem. 169, 547-553 

28. Tosi, M., Duponchel, C, Meo, T., and Julier, C. (1987) Biochemistry 26, 

8516-8524 

29. Wozney, J. M., Rosen, V., Celeste, A. J., Mitsock, L. M., Whitters, M. J., Kriz, 

R. W., Hewick, R. M., and Wang, E. A. (1988) Science 242, 1528-1534 

30. Shimell, M. J., Ferguson, E. L., Childs, S. R., and O'Connor, M. B. (1991) Cell 

67, 469-481 

31. Cseh, S., Gal, P., Sarvari, M.. Dobo, J., Lorincz, Z., Schumaker, V, N., and 

Zavodszky, P. (1996) Mol. Immunol. 33, 351-359 

32. Paoloni-Giacobino, A., Chen, H., Peitsch, M. C, Rossier, C, and Antonarakis, 

S. E. (1997) Genomics 44, 309 -.320 

33. Appel, L. P., Prout, M., Abu-Shumays, R., Hammonds. A., Garbe, J. C, Fris- 

trom. D., and Frislrom, J. (1993) Proc. Natl. Acad. Sci. V. S. A. 90, 
4937-4941 

34. Murdoch, A. D., Dodge, G. R., Cohen, I., Tuan, R. S., and lozzo, R. V. (1992) 

J. Biol. Chem. 267, 8544-8557 

35. Raychowdhury, R., Nilea, J, L., McCIuskey, R. T., and Smith, J. A. (1989) 

Science 244, 1163-1165 

36. Takada, F., Takayama, Y., Hatsuse, H., and Kawakami, M, (1993) Biochem. 

Biophys. Res. Commun, 196, 1003-1009 

37. Sato, T., Endo, Y., Matsushita, M., and Fujita, T. (1994) Int. Immunol. 6, 

665-669 

38. Kinoshita, H., Sakiyama, H., Tokunaga, K., Imajoh-Ohmi, S., Hamada, Y., 

Isono, K., and Sakiyama, S. {19S9) FEBS Utt. 250, 411-415 

39. Lin, C,-Y., Anders, J., Johnson, M., and Dickson, R. B. (1999) J. Biol. Chem. 

274, 18237-18242 





Exhibit 2 1 



Article No. jmbi. 1999.3089 available o 




t http://www.idealibrary.com on liEKl^ J. Mot 




999) 292. 361-373 




Crystal Structure of Enteropeptidase Light Chain 
Complexed with an Analog of the Trypsinogen 
Activation Peptide 

Deshun Lu\ Klaus Fiitterer^, Sergey Korolev^', Xinglong Zheng' 
Kai Tan\ Gabriel Waksman^" and J. Evan Sadler'* 



^Howard Hughes Medical 
Institute, Department of 
Medicine 

^Department of Biochemistry 
and Molecular Biophysics 
Washington University School 
of Medicine, 660 South Euclid 
Avenue, St. Louis, MO 
63H0, USA 



Enteropeptidase is a membrane-bound serine protease that initiates the 
activation of pancreatic hydrolases by cleaving and activating trypsino- 
gen. The enzyme is remarkably specific and cleaves after lysine residues 
of peptidyl substrates that resemble trypsinogen activation peptides such 
as Val-(Asp)4-Lys. To characterize the determinants of substrate speci- 
ficity, we solved the crystal structure of the bovine enteropeptidase cata- 
lytic domain to 2.3 A resolution in complex with the inhibitor Val-(Asp)4- 
Lys-chloromethane. The catalytic mechanism and contacts with lysine at 
substrate position PI are cor\served with other trypsin-like serine pro- 
teases. However, the aspartyl residues at positions P2-P4 of the inhibitor 
interact with the enzyme surface mainly through salt bridges with the 
atom of Lys99. Mutation of Lys99 to Ala, or acetylation with acetic anhy- 
dride, specifically prevented the cleavage of tripsinogen or Gly-(Asp)4- 
Lys-p-naphthylamide and reduced the rate of inhibition by Val-(Asp)4- 
Lys-<hloromethane 22 to 90-fold. For these reactions, Lys99 was calcu- 
lated to account for 1.8 to 2.5 kcal mol"^ of the free energy of transition 
state binding. Thus, a unique basic exosite on the enteropeptidase surface 
has evolved to facilitate the cleavage of its physiological substrate, trypsi- 



nogen. 



1999 Academic Press 



^Corresponding author 



Keywords: crystal structure; enteropeptidase; serine protease; substrate 
recogiution 



Introduction 

Enteropeptidase was discovered one hundred 
years ago in L P. Pavlov's laboratory (Pavlov, 
1902) as the first known enzyme to activate other 
enzymes, and it remains a remarkable example of 
how serine proteases have been crafted by evol- 
ution to regulate metabolic pathways. Enteropepti- 
dase controls a primordial enzymatic cascade that 
is conserved among vertebrates and is essential 
for normal intestinal digestion. When pancreatic 
secretions enter the duodenum, enteropeptidase 
recognizes the acidic activation peptide of trypsi- 
nogen and cleaves it. The trypsin product then 



Present addresses: D. Lu, Cardiovascular Research 
Division, Eli Lilly and Company, Indianapolis. IN 
46285, USA; S. Korolev, Structural Biology Center, 
Argonne National Laboratory, 9700 S. Cass Ave., 
Argonne, IL 60439, USA. 

E-mail address of the corresponding author: 
esn d I e r^Di nn . w u s 1 1 .ed u 



cleaves and activates the other zymogens in pan- 
creatic fluid, enabling the digestion of food. Conge- 
nital deficiency of enteropeptidase in humans 
causes severe intestinal malabsorption with 
diarrhea, vomiting, and growth failure that can be 
treated successfully by supplementation with pan- 
creatic extract (Hadom et a/., 1969; Haworth et ah, 
1971). 

Several enteropeptidase domains are required 
for the efficient activation of trypsinogen. Entero- 
peptidase is a two-chain polypeptide that is 
derived from a single-chain precursor, and consists 
of an N-terminal ^^120 kDa heavy chain that is dis- 
ulfide-linked to a C-terminal ^^47 kDa light chain. 
A transmembrane segment in the heavy chain 
anchors enteropeptidase in the brush border of 
duodenal enterocytes. The light chain consists 
of a chymotrypsin-like serine protease domain 
(reviewed by Lu & Sadler, 1998), Replacement of 
the transmembrane domain by a cleavable signal 
peptide does not impair trypsinogen activation. 



0022-2836/99/370361-13 $30.00/0 



© 1999 Academic Press 



362 



Sf'^^B of Enteropeptidase 



indicating that membrane association is not 
required for substrate recognition (Lu et a/., 1997). 
The removal of heavy chain domains by reduction 
(Liglit & Fonseca, 1984), proteolysis (Mikhailova & 
Rumsh, 1999), or mutagenesis (LaVallie el al, 1993; 
Lu et aL, 1997) reduces the rate of trypsLnogen 
activation ?^500-fold, demonstrating that the heavy 
chain is necessary for optimal cleavage of trypsino- 
gen. The enteropeptidase light chain, however, is 
sufficient for the normal recognition of small 
peptidyl substrates that resemble the trypsinogen 
activation peptide Val-(Asp)4-Lys (LaVallie et ah, 
1993; Lu et al., 1997). 

The structural determinants of substrate speci- 
ficity have not been identified on the enteropepti- 
dase light chain, but their locations have been 
proposed based upon comparisons v/ith other 
serine proteases. The enterof)eptidase serine pro- 
tease domain contains a basic tetrapeptide segment 
consisting of Arg96-Arg-Arg-Lys99 for porcine 
(Matsushima et al, 1994), mouse (Yuan et aL, 1998), 
and human (Kitamoto et al, 1994) enteropeptidase; 
or Lys96-Arg-Arg-Lys99 for bovine (Kitamoto et aL, 
1994; LaVallie et aL, 1993) and rat enteropeptidase 
(Yahagi et aL, 1996). This segment is not conserved 
in other serine proteases, and computer modeling 
suggests that it is located on the protein surface 
where it might bind the acidic P2-P5 residues of 
trypsinogen activation peptides (Kitamoto et aL, 
1994; Matsushima et aL, 1994) (see the legend to 
Figure 2 for the residue numbering). Thus, entero- 
peptidase appears to have an extended binding 
site or "exosite", distinct from the catalytic center, 
which recognizes substrate amino acid residues on 
the N-terminal side of the cleaved bond. At present 
there is no evidence that enteropeptidase has 
specificity for amino acid residues C-termirml to 
the scissiie bond. 

Similar exosites in other highly regulated serine 
proteases are well documented to control the rec- 
ognition of substrates, cof actors and inhibitors. For 
example, the blood clotting protease thrombin has 
two so-called "anion-binding exosites" (Bode et aL, 
1992). Exosite 1 interacts with acidic regions of pre- 
ferred substrates such as fibrinogen and cof actors 
such as thrombomodulin. In contrast to the known 
properties of enteropeptidase, however, thrombin 
exosite 1 interacts with amino acid residues on the 
C-terminal side of the cleaved bond. Thrombin 
exosite 2 is on the opposite side of the molecule 
and interacts with heparin, thereby promoting the 
inhibition of thrombin by antithrombin (Sheehan & 
Sadler, 1994). These exosites have been modified 
by mutagenesis to create thrombin variants with 
novel properties (Sheehan & Sadler, 1994; Wu et aL, 
1991). The characterization of enteropeptidase exo- 
sites, by analogous approaches, would advance 
our understanding of the regulation of digestion 
and facilitate the design of enteropeptidase deriva- 
tives with new substrate specificity. 

We now have determined the crystal structure of 
the bovine enteropeptidase light chain complexed 
with an inhibitor, VaI-(Asp)4-Lys-chloromethane 



(VD4K-cm), that mimics the trypsinogen activation 
peptide. The catalytic mechanism and the subsite 
that recognizes the PI lysine residue are conserved 
with other chymotrypsin-like serine proteases, but 
the aspartyl side-chains at positions P2-P4 of 
the inhibitor are accorrunodated mainly by ionic 
interactions with a unique exosite on the enzyme 
surface. By mutagenesis and chemical modifi- 
cation, we demonstrate that a single lysyl side- 
chain within this exosite is required for the clea- 
vage of trypsinogen and similar peptidyl sub- 
strates. These distinctive features of 
enteropeptidase illustrate the specificity that serine 
proteases can acquire by combiiung modifications 
of the protease domain with additional motifs on 
accessory domains. 

Results 

structure determinatfon 

The crystal structure of the serine protease 
domain of bovine enteropeptidase (L-BEK) boimd 
to the inhibitor VD4K-cm was solved by molecular 
replacement using the structure of Y-chymotrypsin 
(PDB entry code IGCD) (Harel et al, 1991) as the 
search model, to which enteropeptidase shows 
35.9% sequence identity (Figure 1). The structure 
was refined to final R factors of R = 23.4 % and 
Rf^ = 26.9% (Figure 2 and Table 1). For ease of 
comparison to related serine protease structures, 
we use the chymotrypsin-derived residue number- 
ing scheme proposed by Bode et aL (1992). The 
protein used for the present structure determi- 
nation (L-BEK) contains only 13 C-terminal amino 
acid residues of the enteropeptidase heavy chain. 
Note that the usage of the terms "heavy" and 
"light" chain is the reverse of what is common 
usage for chymotrypsin and thrombin. The present 
structure shows cm uninterrupted backbone for the 
two-chain molecule, comprising residues 1 through 
7 (chymotrypsin numbering) of the N-terminal 
domain and residues 16 through 243 of the serine 
protease domain. Residues 8 through 13 of the 
N-terminal domain and residues 244 and 245 of 
the serine protease domain protrude freely into the 
solvent and could not be modeled. 

Tertiary structure 

As expected, based upon its homology to other 
serine proteases, L-BEK is very similar in fold to 
both representative family members chymotrypsin 
and thrombin (Figure 3(a) and (c)): the tertiary 
structure consists of two six-stremded ^barrels, 
either of which makes up about one half of the 
entire molecule. The structure of L-BEK superim- 
poses on chymotrypsin with a root-mean-square 
deviation of 1.10 A for 224 positions, and it 
superimposes on thrombin with a root-mean- 
square deviation of 1.23 A for 234 C" positiorxs. 
Variations in secondary structure occur mainly in 
the loop regions. L-BEK also contains, relative to 



structure of Enteropeptidase 




363 



IBEK 

ThrooO 

Chyrao 



1 

T 
C 
C 



^ i 4 S 6 7 

C K K I 71 T 

C i m P L I f 

G V P A f l Q 



S 9 10 11 12 13 14 I4a I4tl 14C 144 I4e 14 f 14q 14h I4i 14 j \*k 141 14ci 1S T 16 17 

QEVSP - K fl V 

£ICKSLE0KTeS£l.LrSyiOCRIV 
PVLSGLS R I V 



L-SCK 

ThroBBb 

Chyao 



L-BEK 

Throob 

CHymo 



Thf*oiflt> 
Chyno 



Cbymo 



L-BCR 

Thro«flt 

Chywo 



L-BEK 
Thrcwb 



Throwb 
Chymo 



L-BEK 
Throob 

Chyno 



L-8EK 

Thromb 

Chynio 



IS 19 Ze Zl 22 23 24 25 26 27 28 29 30 31 3^ J5 34 35 36 36a 37 31 39 40 41 42 43 44 45 46 



CCSDSRECAttPWVVAlYF 

ecsoa|T]icmspwqvmifr 

N G__ E E A V P m SWPWQVSLQ f P K 



0 
P 
T 



0 

Q 

G 



Q V C 
IL, I t C 



6 



A 5 
A S 



L 

L 



*7 43 49 Sa 51 52 S3 



V T 
I s 

I N 



0 
E 



R 
N 



R 



L 
V 
V 



V 
L 
V 



54 

"T 

T 

T 



55 56 S7 58 59 69 660 6«>6Cc6ed6e«ttf6ej60h69i 

■X — A ymy T v — n g - . . . r h m c 

A A C L l] rPPWDRNFT 

A m C G f V 



P 

E 



62 6 3 64 65 66 67 
1 



K 

N 0 



L 
D 



L 
V 



A V 

V R 

V V 



66 69 70 71 72 73 74 75 



LGIHHASN 
I C K H 5 R CE] R 

A Q g P P Q ^ 1 



76 77 77a 

L T S 

r E R 

s s - 



7» 79 68 81 62 B3 64 gS <6 87 M 69 96 91 ft2 93 94 95 M 
P Q i E T R L I D Q I V i N P H Y N K 



Q 

I E K I 
"tTL.! .,Q t 



S M [T] EXIYIHPRYNN 
>^ I AKVFKHS KYNS 




> 



97o 



H 



IW m IM 183 104 m 106 107 168 109 tlO 111 lt2 113 114 115 1X6 117 118 119 120 121 t22 123 124 125 
W N H I A M 5 H ll H K V H Y f 0 Y IQPICLPX 
D R 1 llAL M KLK X PVAFSOYIHPVCLP iO 

nnHitllkistaas E S Q ! v s a v £ L E L. 



326 127 128 129 129d 129t> 129c 150 Ul 132 133 134 U5 136 137 138 139 140 141 142 143 144 145 146 147 1 46 149 149ol4«»l49c 

,1 - J ^ g 



N 

T 



3 

a a 



p 
Q 

A 



P 
A 
A 



G 

J5L 



Y 
T 



K 
T 



G 
C 



R 
V 



I 
V 
T 



A 
7 
7 



G 
C 
G 



n 



G 



A 
N 

t 



L I 



Y 
E 



Q 

T 



G 
H 
N 



T 
A 



31 



1<9<I 149« 150 151 152 153 154 1S5 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 1 72q 17) 174 175 ITfe 
, - S T I A D V L 0 E A 0 V PLLS NEK C Q Q Q SH P E ESI H fl 

T 



S 
G 
N 



A D V L Q E A 0 
, P SI V t Q V V N 
T |P P R I Q Q A S 



V E 

t s 



N 



K C 
H C 



Q Q 

K 0 
K K 



S 

Y fi 



E 
R 
G 



I 
T 



H 
R 
K 



T 
K 



E 
0 
0 



N 
A 



H 
U 
H 



V 
F 
I 



T 
c 
c 



A 
A 
A 



17 7 176 179 180 181 162 183 164 184o 185 186 J860 l66b I86c 186d 187 168 189 190 191 192 193 194 195 196 197 198 199 200 201 

T — i — fi H v~^: it 5 ? rn A g c - - - 

K P 0 E G K R 

s - - - - c 



G Y 



V 

G 
V 



D 
S 



A 

S 



C 

c 



E 
M 



G 
G 



0 
D 



195 IS 



G 
G 
G 



P 
P 
P 



L 
F 
L 



M C 

V H 

V C 



202 203 204 204a 2640 265 206 207 208 209 210 211 212 jgl3_214 2^ 216 217 m 220 221 221ii 222 223 224 225 226 227 228 



Q E 
K S 
K K 



R 
R 
A 



id 



L 

Q 



A 
H 

X 



V 
V 



s 



vr 
I* 



G 



3; 



Q 

G 



C 
C 



A 
0 



L 
R 



P 
0 



N 
G 



R 
K 



P 
Y 
P 



G 
G 
G 



V Y 
F Y 

V Y 



229 230 231 232 233 2 34 235 236 237 238 239 240 241 242 243 244 245 246 247 



A R 
T H 
A R 



V 
V 



P 
f 



R 
R 



F 
L 



TEW 

It CH w 

V N 



Q 
Q 



Q 



F 
V 
T 



H 
0 
A 



Q 

A 



F 
N 



Figure 1. Sequence alignment of enteropeptidase (L-BEK), chymotrypsin (Chymo) and thrombin (Thromb) protease 
domains. Amino acid sequences are aligned based on topological equivalence of the superimposed crystal structures. 
Amino acid residues are numbered based on the sequence of chymotrypsinogen. Residues of L-BEK and the other 
proteases are boxed if the separation between C" positions is ^1.6 A. Active-site residues (His57, Aspl02, Serl95) are 
in filled black boxes. Residues in contact v^^ith the VD4K-cm inhibitor are shaded in blue. L-BEK secondary structure 
elements are indicated below the sequences; helices (a-helix^ 3iQ-helix) are shown as filled boxes and (J-strands are 
shown as open boxes. Secondary structure conserved with y-diymotrypsin are numbered sequentially, and those 
designated by prime numbers (i.e. 3,ol'/ Pl') are not present in y-chymotrypsin. The arrow indicates the activation 
cleavage site that separates the heavy chain remnant (residues 1-15) from the light chain (residues 16-243). 



chymotrypsin, an additional P-strand, pi', and an 
additional small 3,,,-helix, 3iol' (Figures 1 and 3(a)). 
The 3io-helix is part of the so-calleci "60-loop" that 
connects helix al and strand p4, and a similar S^o- 
helix is present in the much longer 60-loop of 
thrombin. 

The enteropeptidase serine protease domain is 
stabilized by five disulfide bonds, all of which are 



conserved with chymotrypsin: Cysl-Cysl22, 
Cys42-Cys58, Cysl36-Cys201, Cysl68-Cysl82, and 
Cysl91-Cys220 (Figure 3(a)). Thrombin lacks one 
of these disulfide bonds, corresponding to that 
between Cysl36 and Cys201 of enteropeptidase. 
The 13 residue N-terminal chain of L-BEK is co- 
valently linked to the serine protease domain by 
the disulfide bond between Cysl and Cysl22. 



364 



Sn^^B of Enteropeptidase 




Table 1. Data collection and refinement statistics 



Figure 2. Representative regions of electron density. 
Simulated annealing omit maps, using Fourier coeffi- 
cients F„ - and model phases, were calculated by 
deleting the VD4K-chloromethane inhibitor either (a) 
alone or (b)-(c) including an additional region of 3.5 A 
around it-(a) View of the inhibitor peptide from the pro- 
tein outwards. Electron density for the hexapeptide is 
observed for positions PI P4. (Amino acid residues of 
peptidyl substrates or inhibitors customarily are num- 
bered Fl, P2, P3, etc., from the scissile bond toward the 
N terminus, and PI', P2', on the C-terminal side of the 
scissile bond. The corresponding subsites on the cognate 
protease are numbered Si, S2, S3 and SI', S2' (Schechter 
& Berger, 1967)).(b) Interaction of the aspartyl side- 
chains of residues P2-P4 with Lys99 and Tyrl74 of 
L-BEK. (c) Covalent lirUcage of the C terminus of the 
inhibitor to the catalytic residues His57 (N^^-methylene 
carbon) and Serl95 (O^ carbonyl carbon atom), mimick- 
ing the tetrahedral intermediate of the hydrolysis reac- 
tion. The figure was produced with the program O 
Oones & Thirup, 1986; Jones et oL, 1991). 



Aside from this single disulfide bond, the inter- 
actions of this short polypeptide with the bulk of 
the structure are relatively weak, consisting of an 
amino-aromatic interaction between Lys4 and 
Trp27, and hydrogen bonds between main-chain 
atoms of Gly2 and either Trp207 or Prol20. Conse- 
quently, the remaining residues 8-13 of the heavy 
chain are disordered. 

The catalytic center 

The catalytic center contains the signature struc- 
tural elements of serine proteases: the catalytic 
triad consisting of Aspl02, His57 and Serl95; the 
oxyanion hole formed by the main-chain amide 
nitrogen atoms of residues 193 and 195; and the SI 
subsite or specificity pocket that interacts with the 
side-chain of the PI substrate/ inhibitor residue 
(Figure 4(a) and (d)). The VD4K-cm ii\hibitor is 



A. Data collection 
Data set 

Radiation, detector system 
Resolution (A) 
Total/unique reflections 
Completeness (%)" 

R.m. (%)** 

B. Refinement 
Resolution (A) 

Reflections (completeness)' (%) 
Non-H atoms 

T.m.s, deviations' 
Bond lengths (A) 
Bond angles (deg.) 

B values (main-chain/side-chain) (A^) 



Native 
CuKa, Raxis 
30-2.3 

28,051/10,541 
92.6 (89.2) 
4.4 (8.8) 

30.0-23 

9854 (87.6/82.0) 
2023 

23.4/26.9 

0.006 

1-39 

1.5/ZO 



• Completeness for I/a{f}> 10; value for high resolution 
shell (2,38-2.3 A) in parentheses. 



{/)1/E /, where / = observed intensity, and 
(J) = average intensity from multiple observations of symmetry- 
related reflections; the value for the high-resolution shell is in 
parentheses. 

Numbers reflect the "working set" of reflections at F/ 
a(F) > 2.0; values for completeness for the overall /high-resolu- 
tion shell (2.4-2.3 A) are in parentheses. 

** Rfj„ was calculated on the basis of 546 reflections (5.5% of 
the observed reflections) that were randomly omitted from the 
refinement. 

• Root-mean-square (r.m.s.) deviation from ideal bond 
lengths and angles (Engh & Huber, 1991) and r.m.s. deviation 
in 6-factors of bonded atoms. 



identical in sequence to the trypsinogen activation 
peptide and is covalently bound to the catalytic 
residues His57 and Serl95 through its C-terminal 
residue Lys-Pl (Figures 2(c) and 4(a)). The carbonyl 
carbon atom of Lys-Pl forms a tetrahedral hemike- 
tal with Serl95 O^, and the methylene carbon atom 
of the inhibitor is bound to the imidazole ring 
(N*^) of His57. This arrangement mimics the tetra- 
hedral intermediate of the substrate hydrolysis 
reaction. The side-chain of Lys-Pl inserts deeply 
into the SI pocket, at the bottom of which Aspl89 
neutralizes the terminal amino group (Figure 4(b)). 
The interactions of Lys-Pl at the bottom of the 
specificity pocket also include short hydrogen 
bonds to both the hydroxyl group and the carbonyl 
oxygen atom of Serl90. Lys-Pl also makes short 
hydrogen bonds to two water molecules, WAT438 
and WAT407, that correspond to water molecules 
429 and 494, respectively, of the thrombin-hirugen 
complex (Vijayalakshmi et aL 1994). These two 
water molecules are conserved among several ser- 
ine protease structures (Krem & Di Cera, 1998). 
The aliphatic part of the Lys-Pl side-chain packs 
against the main-chain atoms of Phe215 and 
Ser214, as well as the C^^ atom of Thr213 
(Figure 4(b) and (d)). 

The extended substrate binding exoslte 

Despite its covalent attachment to the protein 
through the catalytic center, the VD4K-cm ir\hibitor 
is disordered at its N-terminal end and electron 




Figure 3. Overall fold of enteropeptidase compared to 
y-chymotrypsin and a-thrombin. (a) Stereo ribbon 
diagram of L-BEK. The catalytic residues are labeled 
and the disulfide bonds are shown in yellow. Superposi- 
tion of L-BEK (grey) with (b) y-chymotrypsin (IGCD, in 
cyan) and (c) with human a-thrombin (IPPB, in green). 
The structures were aligned with respect to the C° pos- 
itions of the catalytic residues His57, Aspl02 and 
Serl95, and are shown in the same orientation as for 
L-BEK in (a). This Figure was produced with the pro- 
gram RIBBONS (Carson, 1997), as were Figures 4(a)(c), 
5, and 7. 



density was observed only for residues Lys-Pl 
through Asp-P4 (Figure 2(a)). The inhibitor geome- 
try is remarkably similar to that of D-Phe-Pro-Arg- 
chloromethane (PPACK) in thrombin, as illustrated 
in Figure 5. The aligriment of L-BEK with throm- 
bin, based only on the C** atoms of the catalytic 
triad, leads to a near perfect superposition of the 
two inhibitor molecules, including the C** positions, 
despite their complete lack of sequence similarity. 
Although VD4K-cm forms two main-chain to 
main-chain hydrogen bonds with residues in 
strand pil (Figures 1 and 4(d)), it does not other- 
wise adopt a p-strand configuration in contrast to 
what is observed for the thrombin-PPACK struc- 
ture (Bode et al, 1992). 

Aside from the SI subsite, the major determinant 
of VD4K-cm recognition is Lys99. The basic side- 
chain of this residue coordinates the aspartic acid 
side-chains at positions P2 through P4 of the 
inhibitor. These three carboxylate groups surround 
the terminal amino-group of Lys99 in a fashion 
similar to an inverted tripod. Lys99 forms salt 
bridges only with Asp-P2 and Asp-P4, whereas 
Asp>-P3 is hydrogen bonded to the hydroxyl 




365 



moiety of Tyrl74 (Figure 4(c) and (d)). Residue 
Phe215 is also indirectly involved in substrate 
binding, with its phenyl ring serving as a hydro- 
phobic platform that supports the side-chain of 
Lys99 (Figures 2(b) and 4(c)). 

Lys99 is part of a sequence of four basic amino 
acid residues in the pspfe loop that, based on mol- 
ecular modeling, had been predicted to define the 
substrate specificity of enteropeptidase (Kitamoto 
et aU, 1994; Matsushima et aL, 1994). In the present 
crystal structure the side-chain of Arg97 is comple- 
tely disordered, that of Arg98 is poorly defined, 
and both extend into solvent, Lys96 does not make 
any close contacts with the inhibitor, but folds 
back onto the protein surface to form a short 
hydrogen bond (2.8 A) with the hydroxy! group of 
Tyr94. Tyr60 also is in close proximity to the term- 
inal amino group of Lys96. As discussed below, 
the contribution of these basic residues to substrate 
recognition was examined further by mutagenesis. 

The electrostatic surface of L-BEK (Figure 6) 
includes two prominent positive charges in the 
vicinity of the inhibitor binding site: Lys99 is on 
the N-terminal side and Arg60f is on the C-term- 
inal side of the scissile bond position. Arg60f is 
held in place by hydrophobic interactions with the 
aromatic ring of Phe35 and a short hydrogen bond 
donated by the carbonyl oxygen atom of Cys58 
(Figure 7). The latter interaction positions the 
guanidinium group of Arg60f at a distance of 8 A 
from the catalytic center, where it would not be 
expected to have a direct effect on the recognition 
of VD4K-cm. In the superposition with thrombin 
(Figure 7), the C° atom of Arg60f is closest to the 
atom of Phe60h, but its guanidinium group lies 
close to the head group of Lys60f; the latter forms 
a hydrogen bond with the carbonyl oxygen atom 
of His57. The basic nature of these side-chains and 
their similar position relative to the catalytic center 
suggest that Arg60f of enteropeptidase and Lys60f 
of thrombin may have a similar function in recog- 
nition of residues C- terminal to the scissile bond. 
For thrombin, the effects of mutagenesis are con- 
sistent with this hypothesis because alteration of 
Lys60f markedly impairs the cleavage of fibrinogen 
without affecting the cleavage of D-Phe-pipecolyl- 
Arg-p-nitroanilide (Wu et al, 1991). 

Mutagenesis and chemical modification 
of L-BEK 

To determine the contribution of specific basic 
amino acid residues to substrate recognition, 
mutant forms of L-BEK were prepared in which 
each of the Arg or Lys residues at positions 60f 
and 96-99 was changed to Ala. The proteins were 
expressed in a baculovirus system and purified 
by affinity chromatography on STI-agarose. In 
addition, a sample of purified L-BEK was treated 
with acetic anhydride. The conditions of acety- 
lation were shown previously to result in the 
efficient modification of lysyl residues on porcine 
enteropeptidase (Baratti & Maroux, 1976). By 



366 





of Enteropeptidase 




(C) 




NH 




Co 



\K99 HC I " 



...HN 



/ 

\c=o 



{32U N l -V- 




Ca 



\S2it o [ --y, -r--HN 



Jl£2> 



2^ 



Ca- 



Oxy-r ' ' ,' 4,1 

onionl \ul9<, h \ ' 

L |si95 w 1 -^-?- 




(3.1 



~5m 



Co 

c 



CH2 



t • 3.1 

V. 



ca Clil 



Covalcnt Bonds 



SI Recognition 
Site 



Figure 4. Close-up view of the 
inhibitor binding site, (a) The 
C-terminal Lys (K-pl) of the inhibi- 
tor is covalently bound (thick lines 
in magenta) to His57 and Serl95 of 
L-BEK. The carbonyl oxygen atom 
of Lys-Pl (K-pl) forms hydrogen 
bonds (thin cyan lines) witti water 
WAT436 and the main-chain nitro- 
gen atoms of Serl95 and Glyl93, 
the latter being part of the "oxya- 
nion hole", (b) SI recognition 
pocket showing protein residues in 
contact with K-pl. (c) Stereo view 
of the P2-P4 binding sites. The 
side<hain of Arg97 is disordered 
and modeled as Ala. Inhibitor 
residues are labeled in magenta 
throughout, and protein residues 
are labeled in black. Atom color 
coding: carbon, grey; oxygen, red; 
nitrogen, blue; sulfur, green; and 
water molecules, yellow spheres, 
(d) Schematic diagram of protein- 
inhibitor interactions. Broken lines 
indicate contacts for which the 
distances are given in Angstroms. 



SDS-poIyacrylamide gel electrophoresis, all pro- 
teins appear to be homogeneous. Under non-dena- 
turing conditions, acetylated L-BEK exhibits 
markedly increased electrophoretic mobility con- 
sistent with the neutralization of amino groups 
(Figure 8). 

Each of these proteins cleaved the small ester Z- 
Lys-SBzl with nearly normal kinetics^ demonstrat- 
ing that the catalytic center was intact (Figure 9 
and Table 2). Cleavage of the larger substrates Gly- 
(Asp)4-Lys-p-naphthylamide (GD4K-na) and trypsi- 
nogen was decreased minimally by the substi- 
tutions Arg97AIa and Arg;98Ala. The mutations 



Arg60fAla and Lys96Ala decreased the catalytic 
efficiency of GD4K-na cleavage by up to approxi- 
mately fivefold (Table 2) and similarly decreased 
the relative rate of trypsinogen activation 
(Figure 9), indicating a modest change (-hO.8 to 
+1.0 kcal mol"^) in the free energy of transition 
state binding, AG^ (Wilkinson et al, 1983). How- 
ever, activity toward both of these substrates was 
essentially abolished by the mutation Lys99Ala. 
Accurate kinetic constants could not be determined 
for this mutation (Table 2); the low relative activity 
(Figure 9) toward both GDaK-na {^3%) and trypsi- 
nogen (^^1.5%) suggests that removal of this lysyl 



Structure of Enteropeptidase 




367 



H57 




Figure 5. Structural superposition of the VD4K-cm 
inhibitor of enteropeptidase with D-Phe-Pro-Arg-chloro- 
methane (PPACK) of thrombin (IPPB). The alignment 
resulted from the superposition of the C" positions of 
the catalytic residues His57, Aspl02, and Serl95 in both 
proteins. Enteropeptidase residues and inhibitor atoms 
are shown in color^coded sticks: grey for red for O, 
blue for N. Residues and inhibitor atoms of thrombin 
are shown in green sticks. The view is from the protein 
outwards. 



side-chain increases AGy by 2.1 to 2.5 kcal mol~^. 
Acetylation of L-BEK also markedly decreased the 
rate of cleavage of both GD4K-na {^13%) and 
trypsinogen (^^1.5%), but enhanced the cleavage of 
Z-Lys-SBzl (Figure 9 and Table 2). 

Rate constants for inhibition by VD4K-cin also 
were determined to assess the effect of mutations 
on the recognition of the trypsinogen activation 
peptide (Table 3). The magnitude and direction of 
the changes are similar to those observed for clea- 
vage of GD4K-na and trypsinogen. The substi- 
tutior\s Arg60fAla, Lys96Ala, Arg97Ala, and 
Arg98Ala had modest effects on the inhibition 
reaction, increasing AG-r by 0.3 to 0.8 kcal mol'^ 
In contrast, the mutation Lys99Ala markedly 
reduced the rate of inhibition, increasing AGj by 
1.8 kcal mol~*. Acetylation of L-BEK also markedly 
slowed the rate of inhibition by VD4K-cm, increas- 
ing AGj by 2.7 kcal mol"'. These values of AAGy 
for inhibition by VD4K-cm are consistent with 



those estimated from the relative rates of substrate 
cleavage (Figure 9). 

Discussion 

structural interpretation of substrate specificity 

Limited qualitative studies employing protein 
substrates (Anderson et a/., 1977; Light et al, 1980) 
and synthetic peptides (Maroux et al., 1971) indi- 
cate that mammalian enteropeptidase is remark- 
ably specific. With few exceptions, the PI residue 
must be basic (e.g. Lys, Arg, or homoarginine) and 
the P2 and P3 positions must be acidic (e.g. Asp, 
Glu or carboxymethylcysteine). The substituents at 
P4 and P5 are less critical, but additional acidic 
residues in these positions increase affinity for the 
enzyme (Maroux et al., 1971). 

• TTie crystal structure of L-BEK provides a 
reasonable explanation for these properties. The 
catalytic center of enteropeptidase is conserved 
with related enzymes that prefer a basic side-chain 
in the PI position such as trypsin, and Lys-Pl of 
the inhibitor VD4K-cm makes numerous close con- 
tacts with L-BEK (Figure 4(d)). Acidic residues on 
the N-terminal side of residue PI interact with an 
extended exosite on the enzyme surface, and the 
number of contacts decreases as the distance from 
the catalytic center increases. For example, Asp-P2 
main-chain atoms make four close contacts with 
L-BEK, and its carboxylate side-chain makes two 
H-bonds with the atom of Lys99; Asp-P3 makes 
half as many contacts, Asp-P4 makes only one 
H-bond between its carboxylate group and the 
atom of Lys99, and residues Asp-P5 and Val-P6 
are disordered. Thus, the interface between L-BEK 
and VD4K-cm is consistent with the increased 
tolerance for variarions in substrate structure at 
positions distal to P3. 

The distribution of interactions between VD4K- 
cm and bovine enteropeptidase is mirrored by the 
observed variation among trypsinogen activation 
peptides. Sequences are known for at least 30 




Figure 6. Electrostatic surface diagram of the Val-(Asp)4-Lys-chloromethane inhibitor binding site of enteropepti- 
dase. Negative and positive surface charges are shown in deep red and blue, respectively, with linear interpolation in 
between. Conserved water molecules WAT407 and WAT438 are shown as spheres in cyan, inhibitor atoms are 
shown as sticks and are color-coded as described in the legend to Figure 4. (a) Overall view, (b) Close up view of the 
SI binding pocket. The Figure was produced with the program GRASP (Nicholls c^ a/., 1991). 



368 





of Enteropeptidase 




Figure 7. Structural role of residue Arg60f in comparison to Lys60f of thrombin. Enteropeptidase secondary struc- 
ture elements are shown in grey and atoms are color<oded as described in the legend to Figure 4. Secondary 
structure elements and carbon atonrw of thrombin are shown in green, keeping all other atom color assignments unal- 
tered. The structures were aligned as shown in Figure 3. Interestingly, Arg60f aligns with Phe60h of thrombin with 
regard of the C° position, while its guanidinium group is very close to the terminal amino group of Lys60f of 
thrombin. 




genetically distinct trypsinogens, representing 
mamnnals, birds, amphibians and fish (Bricteux- 

200 Gregoire et ah, 1972; Lu & Sadler, 1998). Position 
PI is occupied almost exclusively by Lys. Very few 

^' trypsinogens have Glu instead of Asp at position 

68 P2 or P3. Most residues at position P4 are Asp, but 

Glu or Asn occur in 55^30% of cases. Position P5 

^ shows more variation; Asp is present in ?^60%, but 

aromatic, aliphatic, small polar and basic side- 

29 chains also are found. Position P6 is not cor\served. 

Therefore, the tendency of trypsinogen activation 
peptide residues to vary during vertebrate evol- 
ution correlates inversely with the number and 
location of close contacts in the L-BEK-VD4K struc- 
ture. 



Figure 8. Gel electrophoresis of enteropeptidase var- 
iants, (a) Samples (5 pg) of affinity purified enteropepti- 
dase variants were analyzed by SDS-polyacrylamide gel 
electrophoresis without reducing agent and visualized 
by staining with Coomassie brilliant blue (Lacmmli, 
1970). The positions of molecular mass markers are indi- 
cated at the right in kilodaltons. (b) Enteropeptidase var- 
iants were analyzed by native gel electrophoresis using 
a similar polyacrylamide gel and buffer system except 
that SDS was omitted from the sample buffer. 



Energetic contributions of specific residues to 
substrate recognition 

The contacts between L-BEK and VD4K-cm are 
dominated by ionic interactions between aspartyl 
side-chains and Lys99, and the importance of these 
interactions is supported by the effect of acety- 
lation on enteropeptidase specificity. Reaction of 
porcine enteropeptidase with acetic anhydride 
reduces its activity toward trypsinogen by more 
then 98%, but increases its activity toward L-N-a- 
benzoylargirune p-nitroarulide (L-BAPNA) by 1.8- 
fold (Baratti & Maroux, 1976). These studies were 
performed with full-length enteropeptidase and 
therefore could not localize the critical modified 
residues to either the light chain or the heavy 
chain. However, we found that acetylated L-BEK 
has a similar phenotype: it cleaves the simple 
thioester substrate Z-Lys-SBzl more rapidly than 
does native L-BEK (Table 2), but cannot cleave 
either GD4K-na or trypsinogen (Figure 9), TTius, 



structure of Enteropeptidase 



369 




Figure 9. Relative rates of substrate cleavage by enter- 
opeptidase variants. The activity of the indicated prep- 
arations of enteropeptidase light chain Wcis assayed with 
the substrates Z-Lys-SBzl (open boxes), GD4K-na (filled 
boxes), and trypsinogen (hatched boxes). The values 
obtained are expressed as the mean percentage ± SE for 
at least three independent determinations, normalized to 
the activity observed for wild-type L-BEK (100%). 



residues in the enteropeptidase light chain that are 
sensitive to acetylation, such as Lys or Tyr, are 
necessary for the recognition of peptidyl substrates. 
The best candidate target to explain the effect of 
acetylation is Lys99, which makes at least three H- 
bonds with Asp-P2 and Asp-P4 in the L-BEK- 
VD4K complex (Figure 4(d)). The other possibility, 
Tyrl74, makes ordy a single H-bond with Asp-P3. 

Mutagenesis and kinetic studies support a major 
contribution of Lys99 to the energetics of substrate 
binding. Substitution of Lys99 by alanine caused 
similar impairments in the ability of enteropepti- 
dase to cleave either GD4K-na or trypsinogen 
(Figure 9 and Table 2), and in the rate of entero- 
peptidase inhibihon by VD4K-cm (Table 3), For the 
latter reaction, the Lys99Ala mutation increased 
AGt by 1.8 kcal mol"^ and acetylation of L-BEK 
increased AGj by 2.7 kcal mol"^ Mutations at 
other positively charged residues have much smal- 
ler effects on the kinetics of substrate cleavage or 
inhibition by VD4K-cm. The similar phenotypes of 
acetylated L-BEK and the Lys99Ala mutant are 
consistent with the importance of ionic interactions 
in the recognition of substrate residues in the 
P2-P4 positions, and suggest that the effects of 



acetylation are due mainly to the loss of positive 
charge at Lys99. 

A hierarchy of functional sites participates in 
substrate recognition 

The extended contacts between L-BEK and 
VD4K-cm appear to explain the preference of enter- 
opeptidase for similar peptidyl substrates, but do 
not fully account for the efficient activation of tryp- 
sinogen. Two-chain enteropeptidase cleaves trypsi- 
nogen i^500-fold more rapidly than does the 
isolated light chain (Lu et al., 1997), indicating that 
the heavy chain promotes physiological substrate 
recognition. Thus, a hierarchy of functional sites 
has evolved to optimize trypsinogen activation. 
The catalytic center confers specificity for cleavage 
after basic amino acid residues. An exosite on the 
light chain, distinct from the catalytic center, recog- 
nizes acidic trypsinogen activation peptides, and at 
least one site on the heavy chain interacts with and 
further accelerates the cleavage of trypsinogen. 
This feature of the enteropeptidase-trypsinogen 
interaction is shared by many other serine pro- 
teases that participate in highly regulated meta- 
bolic pathways, and it illustrates general principles 
underlying the adaptation of serine protecises to 
cleave a restricted range of substrates. Such adap- 
tation often has been accomplished by exploiting 
structural features of both catalytic and non-cataly- 
tic domains to interact with complementary 
surfaces on cofactors or substrates. 

Materials and Methods 

Reagents and proteins 

Bovine trypsinogen and bovine trypsin were from 
Worthington (Freehold, NJ). Thiobenzyl benzyloxy- 
carbonyl-L-lysinate (Z-Lys-SBzl), and the enteropeptidase 
substrate Gly-Asp-Asp-Asp-Asp-Lys-P-naphthylamide 
(GD4K-na) were from Bachem (King of Prussia, PA). 
Chromogenic substrates S-2366 (pyroGlu-Pro-Arg-p- 
nitroanilide) and S-2765 (Z-D-Arg-Gly-Arg-p-nitroani- 
lide) were from Chromogenix (Sweden). Ovomucoid, 
soybean trypsin inhibitor agarose (STI-agarose), acetic 
anhydride, p-nitrophenyl p'-guanidinobenzoate, and 5,5'- 
dithiobis(2-mtrobenzoic acid) (DTNB) were from Sigma 
(St. Louis, MO). 



Table 2. Kinetic parameters for the cleavage of substrates Z-Lys-SBzI 


and GD4K-na 










Z-Lys-SBzl 






GD4K-na 




Enzyme 


K„ (nM) 




fcc3./K„, (MM-' s-^) 


(mM) 




(mM-' s-') 


L-BEK 

Acetyl L-BEK 

R60fA 

K96A 

R97A 

R98A 

K99A 


120 ± 10 
40 ±10 
120 ± 10 
100 ±30 
120 ± 40 
140 ± 10 
50 ±10 


129 ± 4 
111 ±4 
159 ± 19 
108 ±22 
128 ±33 
128 ±3 
120 ± 1 


1.05 
2.93 
1.36 
1.10 
1.02 
0.88 
2.53 


0.61 ± 0.09 

NA 
0.73 ± 0.08 
1.25 ± 0.07 
0.66 ± 0.07 
0.77 ± 0.02 

NA 


42.7 ± 4.0 

NA 
12.7 ± 1.0 
17.1 ± 1.5 
25.5 ± 2.3 
39.1 ± 0.8 

NA 


70.4 
NA 
17.3 
13.7 
38.6 
51,0 
NA 



Values for and Jt„, are expressed as the mean ± SE of three independent determinations. NA, activity insufficient to determine 
kinetic constants. 



370 



Strv^^m of Enteropeptidase 



Table 3. Kinetic parameters for the inhibition of enteropeptidase 






Enzyme 








AACt (kcal mol-*) 


L-BEK 


0.013 ± 0.003 


1.0 ±0.3 


l0.4 X 


n 


Acetyl L-BEK 


0.0010 ± 0.0001 


7.3 ±1.2 


0.15 ± 0-02 


+2.7 


R60fA 


0.061 ± 0.015 


17±5 


3.59 ± 0.08 


+0.8 


K96A 


0.0048 ± 0.0008 


0.9 ± 0.3 


5.9 ±1.3 


+0.5 


R97A 


0.0073 ± 0.0015 


1.0 ±0.3 


7.5 ± 0.4 


+0.3 


R98A 


0.0072 ± 0.0002 


0.84 ±0.04 


8.7 ± 0.2 


+0.3 


K99A 


0.00024 ± 0.00001 


0.4 ± 0.2 


0.6 ±0.1 


+1.8 



Values for K, and k2 are expressed as the mean ± SE of at least three independent determinations. 



Plasmid constructs 

Plasmid pBlue-newL was prepared from pBEK by a 
PCR mutagenesis strategy as described (Lu et al., 1997; 
Nelson & Long, 1989) and encodes the human prothrom- 
bin signal peptide (Metl-Phe28) fused to the carboxyl- 
terminal 251 amino acid residues of bovine enterof>epti- 
dase (Tyr785-Hisl035) (Kitamoto et al, 1994). Using a 
similar mutagenesis method, plasmid pBlue-newL was 
altered to contain mutations encoding each of the amino 
acid substitutions Arg60fAla, Lys96AIa, Arg97Ala, 
Arg98Ala, and Lys99Ala. The segment encoding the 
chimeric prothrombin-enteropeptidase construct was 
excised from each plasmid by digestion with HmdlH, 
made blunt with DNA polymerase, and ligated into the 
Sma\ site of the expression vector pVL1392 (Pharmingen, 
Carlen, CA) to yield plasmids pVLnewL, pVLR60fA, 
pVLK96A, pVLR97A, pVLR98A, and pVLK99A. 

A fragment of plasmid pBEK encoding amino acid 
residues Cys788-Hisl035 of bovine enteropeptidase 
(Kitamoto et al, 1994) was amplified by PCR and 
inserted into the Ncol site of expression vector pET-lld 
(Novagen, Madison, WI) to yield plasmid pETL. The 
construct encodes two amino acid residues derived from 
the vector (Met-Ala) before commencing with enteropep- 
tidase sequence at Cys788. For all plasmids, the seg- 
ments derived by PCR were sequenced to confirm the 
accuracy of the construction. 



Production of enteropeptidase light chain in 
Escherichia colt (L-BEK) 

B. coli BL21 (DE3) cells (Stratagene) containing pETL 
were grown in two liters of LB/ampicillin medium, and 
recombinant L-BEK was solubilized from the inclusion 
bodies at room temperature with 10 ml of 0.1 M Tris- 
HCl (pH 8.6), 1 mM EDTA-Na, 150 mM dithioerythritol, 
and 6 M guanidine HCl. L-BEK was refolded by a modi- 
fication of a protocol described for the refolding of tissue 
plasminogen activator from lysates of E. coli (Kohnert 
€t c/., 1992). After centrifugation for 30 minutes at 50,000 

the solubilized protein was dialyzed at room tempera- 
ture against 3 M guanidine-HCl (pH 2.5), and mixed 
with 10 ml of oxidation buffer (50 mM Tris-HCi (pH 9.3), 
6 M guanidine-HCl, 0.1 M oxidized glutathione). After 
dialysis agair\st 3 M guanidine-HCl (pH 8.0), disulfide 
exchange and refolding were initiated by dropwise 
dilution with stirring into 500 ml of 0.7 M arginine-HCl 
(pH 8.6), 2 mM reduced glutathione, and 1 mM EDTA. 
After 72 hours, the reaction was dialyzed against 20 mM 
Tris-HCl (pH 7.6), 20 mM NaCl, and then digested with 
trypsin (1:50 molar ratio) for one hour. The trypsin was 
inactivated with a fourfold excess of ovomucoid and 
active L-BEK was purified to homogeneity by affinity 



chromatography on STI-agarose. The yield was 10 mg 
per two liter culture. 

The N-temninal amino acid sequence of L-BEK was 
determined after SDS-PAGE and electroblotting onto a 
polyvinylidene difluoride membrane (Kalafatis & Maim, 
1993). Ttie product had the expected two-chain structure 
and the predicted first Met residue was removed com- 
pletely during biosynthesis. The mass of L-BEK was 
27,741 Da by electrospray ionization mass spectrometry, 
and this value is consistent with the calculated mass of 
27,739,6 Da. The concentration of L-BEK determined by 
active-site titration with p-nitrophenyl p'-guanidino- 
benzoate (Chase & Shaw, 1970) agreed with the value 
determined spectrophotometrically at 280 nm using the 
calculated extinction coefficient (Pace et a/., 1995) of 
70,870 M~^cm-\ 



Production of wild-type and mutant enteropeptidase 
in bacutovirus 

Constructs pVLnewL, pVLR60fA, pVLK96A, 
pVLR97A, pVLR98A, and pVLK99A were cotransfected 
with BaculoGold DNA (Pharmingen) into Sf9 cells and 
high-titer recombinant baculovirus was prepared by 
repeated infectior\. High Five cells (1 x 10* per ml, Invi- 
trogen) were grown in Express Five serum free medium 
supplemented with 20 mM glutamine. Suspension cul- 
tures (200 ml each) were ii\fected with 0.5 ml virus 
stock. After 72 hours, conditioned medium was collected 
and adjusted to pH 8.0 by addition of ^^^20 ml/1 1 M 
Tris-HCl (pH 8), and precipitated glutamine was 
removed by centrifugation. Recombinant enteropepti- 
dase was purified by affinity chromatography on STI- 
agarose. Tlie yield was up to ~15 mg of apparently 
homogeneous enteropeptidase light chain per liter of 
medium. 



Affinity purification of enteropeptidase light chain 
variants on STI-agarose 

High Five cell conditioned medium (1000 ml) was 
applied at 50 ml /hour to a column (2 ml) of STI-agarose 
equilibrated with 20 mM Tris-HCl (pH 7.5), 50 mM 
NaCl, at 4 "^C. The column was washed with 10 ml of 
20 mM Tris-HCl (pH 7.5), 1 M NaCl, followed by 50 ml 
of 20 mM Tris-HCl (pH 7.5). Enteropeptidase was eluted 
with 50 mM glycine-HCl (pH 3.0); 1 ml fractions were 
collected and neutralized immediately with 50 nl of 2 M 
Tris-HCl (pH 8.0). Refolded and trypsin-acrivated L-BEK 
prepared in £. coli was purified similarly, applying the 
product obtained from a two liter culture to the column. 
Fractions were analyzed by SDS-PAGE (Laemmli, 1970) 
and silver staining (Morrissey, 1981), pooled, dialyzed 



structure of Enteropeptidase 




against 20 mM Tris-HCI (pH 7.5). 50 mM NaCI, and 
stored at -70 ^C. 



Preparation of a stoichiometric complex of L-BEK 
and VDDDDK-chloromethane 

The active site directed inhibitor Val-(Asp)4-Lys-chlor- 
omethane (VD^K-cm) was synthesized (Haematologic 
Technologies, Inc.) and its structure was confirmed by 
amino acid composition. Electrospray ionization mass 
spectrometry gave a mass of 739.3 Da and the predicted 
mass was 739.2 Da. Affinity-purified L-BEK from £. coli 
(10 mg) in 100 ml of 20 mM Tris-HCI (pH 7.5), 50 mM 
NaCl, was reacted on ice with 50 ml of 100 \xM VD4K- 
cm added dropwise over 60 minutes. The L-BEK-VD4K 
complex was dialyzed at 4°C against 20 mM Tris-HCI 
(pH 7.5), 50 mM NaCI, and concentrated to 25 mg/ml 
by ultrafiltration (Centricon-30, Ami con). The mass 
determined by electrospray ionization mass Sf)ectrometry 
(28^448 Da) was consistent with the mass calculated for 
the expected stoichiometric complex (28,442.3 Da). 



Crystallization of L-BEK and data collection 

Crystals of L-BEK-VD4K complex were grov^m at 20 '^C 
in a hanging drop against a reservoir of 100 mM sodium 
cacodylate (pH 5.0), 10 mM zinc sulfate, and 10% (w/v) 
PEG-400 at a protein concentration of 4 mg/ml. The 
crystals were orthorhombic (P2,2,2,) with one molecule 
per asymmetric unit and ceil dimensions of a = 39.99 A, 
b = 70.65 A, and c = 85.22 A. A crystal was transferred 
into cryoprotectant buffer containing 100 mM sodium 
cacodylate (pH 5.0), 20 mM zinc sulfate and 25% (w/v) 
PEG-400, and frozen at 100 K in a stream of nitrogen 
vapor. Data were collected using a Rigaku RaxisII image 
plate detector mounted on a Rigaku RU200 rotating cop- 
per anode. A data set complete to 2.3 A resolution was 
collected. Data were processed and scaled using the pro- 
grams DENZO and SCALEPACK (Otwinowski & Minor, 
1996). 



Structure determination and refinement 

Initial phases for the structure of L-BEK were obtained 
by molecular replacement, using the program AMoRe 
(Navaza, 1994) and the crystal structure of 7-chymotryp- 
sin (PDB entry code IGCD) (Hard ct at., 1991) as the 
search model. A strong unique solution was found, with 
correlation factors of 0.38 and 0.17 for the highest and 
second highest peak, respectively. Rigid body refinement 
followed by positional refinement using X-PLOR 
(Briinger, 1992) resulted in values for R and Kf^ee of 
43.0% and 49.2%, respectively. 

The rebuilding process, using the program O Qones & 
Thirup, 1986; Jones et al„ 1991), started by aligning the 
primary sequences of L-BEK and v-chyrriotrypsin. The 
model was modified by removing the diethyl phosphate 
inhibitor from the chymotrypsin structure, trimming 
loop regions of poor sequence conservation, and then by 
substituting the Y-<:l^y^o^ryps"^ residues either by 
alanine or by their proper counterparts in L-BEK, 
depending on the degree of sequence conservation. 
Further decreases in R and Rf^ee were achieved by using 
the structure of thrombin (PDB entry code IPPB) (Bode 
ct al., 1992) as a guide in regions where sequence cor\ser- 
vation with L-BEK suggested structural similarity, build- 
ing the C^" trace into 2Fo - maps. At this point the 




371 



value for K,^ dropped to 38.5%, and R decreased to 
33.5%. 

With the C" trace in place, the model was subjected to 
two rounds of rebuilding guided by simulated annealing 
omit maps (Hodel et at., 1992) in order to eliminate 
model bias of the initial search model with intermittent 
positional refinement, using the maximum likelihood 
target in the program CNSsolve QS (Briinger et at,, 1998), 
resulting in a value for of 33.5% that decreased to 
31.5% after individual B-factor refinement. A total of 45 
water molecules were added to the model and verified 
by inspection of the 2f ^ — electron density map. Two 
large spherical patches of electron density, clamped 
between acidic side-chains of symmetry-related 
molecules, were interpreted as Zn^"^, consistent with the 
presence of 20 mM zinc sulfate in the cryoprotectant sol- 
ution. Their incorporation into the model led to a small 
but significant decrease of both R and factors. The 
inhibitor Lys residue could be seen in 2Fj, — maps at 
an early stage of the building process, yet the remaining 
five residues were elusive until later in the refinement 
process. Eventually, residues Lys-Pl through Asp-P4 
could be built in an unequivocal manner into simulated 
annealing omit maps, with density missing for the two 
N-terminal amino acid residues of the inhibitor, Asp-P5 
and Val-P6. The final model comprises residues 1 
through 7 of the heavy chain, residues 16 through 243 of 
the serine protease domain of enteropeptidase, residues 
PI through P4 of the VD4K-cm inhibitor, two Zn^"^ and 
108 water molecules. The side-chaii\s of Lys3, Arg97 and 
Asn205 lacked electron density and were built as Ala. 
After bullc solvent correction and individual B-factor 
refinement, the model converged to R = 23.4 % and 
^^fr« = 26.9% for the resolution range 30-2.3 A, using a 
cut-off of f/a(F)> 2.0, with excellent stereochemistry 
emd B-factors appropriately restrained (Table 1). There 
are no residues in disallowed regior\s of the Ramachan- 
dran plot, and only two residues in generously allowed 
regions. 

Preparation of acetylated enteropeptidase light chain 

Purified L-BEK from baculovirus (5.5 jiM, 4 ml) in 
0.1 M sodium phosphate (pH 7.0), was stirred on ice 
with 6 \i\ acetic anhydride added in three portions. The 
reaction was maintained at pH 7.0 by the dropwise 
addition of sodium hydroxide. After one hour, the reac- 
tion was dialyzed against 20 mM Tris-HCI (pH 7.6), 
20 mM NaCl. 



Enzyme kinetics 

The concentration of each enteropeptidase was deter- 
mined by active-site titration with p-rutrophenyl 
p'-guanidinobenzoate (Chase & Shaw, 1970). Kinetic par- 
ameters for cleavage of Z-Lys-SBzl were obtained as 
described (Green & Shaw, 1979), Assays were performed 
at room temperature in 1 ml of 0.1 M Tris-HCI (pH 8,0), 
260 |iM DTNB, and 10 ^iM to 500 nM Z-Lys-SBzl. Reac- 
tion was initiated by adding enzyme (0.2 to 1.6 nM) and 
the rate of 3-carboxy-4-nitrophenoxide production was 
calculated from the absorbance at 412 nm, using an 
extinction coefficient of 13,600 M"^ cm"'. 

Kinetic parameters for the cleavage of the synthetic 
peptide substrate GD4K-na were determined as 
described (Grant & Hermon-Taylor, 1979; Lu et at., 1997). 
Values for and were obtained by directly fitting 
to the Michaelis-Menten equation by non-linear least 




squares regression. Under all assay conditions, the con- 
sumption of substrate (Z-Lys-SBzI or GD4K-na) wsis 
<15%of the total. 

Trypsinogen activation was assayed at pH 5.6 as 
described (Anderson et a/,, 1977; Lu et al., 1997). Assays 
(0.1 ml) contained 25 ^M trypsinogen^ 50 mM sodium 
citrate (pH 5.6) at room temperature. Reaction was 
initiated by addition of 2 nM enteropeptidase. After ten 
minutes, reaction was terminated by addition the of 2 \i\ 
of 2 M HCI. To quantify the trypsin product, an equal 
volume of 250 \xM 3-2765 in 20 mM Tris-HCl (pH 8.4), 
150 mM NaCl was added and absorbance at 405 nm 
recorded after five minutes. 

Changes in the free energy of transition state stabiliz- 
ation (AAGy) were calculated from the relationship 
A AGt = - KT In (/c„t/^m)muunt/(^«t/^)wud-type/ wherc 
R is the gas cortstant, T is the absolute temperature, k^^ 
is the turnover number, and is the Michaelis constant 
(Wilkinson et aL, 1983). 

Inhibition by VD4K-chloromethane 

Reactions were performed in 200 \x\ of 100 mM Tris- 
HCl (pH 8.0), VD^K-cm (2 nM to 2 ^M) and 2 nM enter- 
opeptidase at 22 'C. At selected time intervals, 30 \x\ 
samples were removed and added to 200 pi of 100 mM 
Tris-HCl (pH 8-0), 300 pM Z-Lys-SBzl, and 180 \xM 
DTNB to assay the remaining achve enteropeptidase. For 
each concentration of inhibitor, the pseudo first-order 
rate constant for inactivation, k', was determined from 
the relationship \n E ~ — k't + \n Eq, where E is the con- 
centration of active enzyme remaining at time (f), and Eq 
is the initial or total concentration of enzyme. 
The second -order rate constant for inactivation, fcz, and 
the dissociation constant for reversible inhibitor binding, 
K„ were determined from the relationship k' = k^U]/ 
(in + ^,), where [/] is the inhibitor concentration (Kitz & 
Wilson, 1962). Changes in the free energy of transition 
state stabilization (AAGt) were calculated from the 
relationship A AGr = - RT In (/Cj/J^i) mutant / (fc2/'^i) wild-type 
(WUkinson et al, 1983). 

Protein Data Bank accession number 

The coordinates have been deposited with the Protein 
Data Bank for immediate release under accession code 
lekb. 



Acknowledgements 

We thank Milan Kapadia for assistance in the purifi- 
cation of enteropeptidase variants, Dr Mark Crankshaw 
for performing the mass spectrometry analyses, and t>r 
Enrico Di Cera for advice on the refolding of recombi- 
nant proteases expressed in E. coli. This work was sup- 
ported in part by National Institutes of Health grants 
DK50053 (to J.E.S), GM54033 (to G.W.), and T32HL07088 
(to D.L). J.E.S. is an Investigator and D.L. was an Associ- 
ate of the Howard Hughes Medical Institute. D.L. and 
K.F. contributed equally to this work. 

References 

Anderson, L. E., Walsh, K. A. & Neurath, H. (1977). 
Bovine enterokinase. Purification, specificity, and 



some molecular properties. Biochemistry, 16, 3354- 
3360. 

Baratti, J. & Maroux, S. (1976). On the catalytic and 
binding sites of porcine enteropeptidase. Biochim. 
Biophys, Acta, 452, 488-496. 

Bode, W., Turk, D. & Karshikov, A. (1992). The refined 
1.9- A X-ray crystal structure of D-Phe-Pro-Arg 
chloromethylketone-inhibited human alpha-throm- 
bin: structure analysis, overall structure, electro- 
static prof>erties, detailed active-site geometry, and 
structure-function relationships. Protein Sci. 1, 426- 
471. 

Bricteux-Gregoire, S., Schyns, R. & Rorkin, M. (1972). 

Phylogeny of trypsinogen activation peptides. 

Comp. Biochem. Physiol 42B, 23-39. 
Briinger, A. T. (1992). X-PLOR Version 3.1: A System for 

Crystallography and NMR, Yale University Press, 

New Haven, CT. 
Briinger, A. T., Adams, P. D., Clore, G. M., DeLano, 

W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., 

Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., 

Rice, L. M., Simonson, T. & Warren, G. L. (1998). 

Crystallography & NMR system: a new software 

suite for macromolecular structure determination. 

Acta Crystallog. sect, D, 54, 905-921. 
Carson, M. (1997). Ribbor\s. Methods Enzymol. 277, 493- 

505. 

Chase, T. & Shaw, E. (1970). Titration of trypsin, plas- 
min, and thrombin with p-rutrophenyl p'-guanidino- 
benzoate HCI. Methods Enzymol 19, 20-27. 

Engh, R. A. & Huber, R. (1991). Accurate bond and 
angle parameters for X-ray protein structure refine- 
ment. Acta Crystallog. sect. A, 47, 392-400. 

Grant, D. A. W. & Hermon-Taylor, J. (1979). Hydrolysis 
of artificial substrates by enterokinase and trypsin 
and the development of a sensitive specific assay 
for enterokinase in serum. Biochim. Biophys. Acta, 
567, 207-215. 

Green, G. D. G. & Shaw, E. (1979). Thioberxzyl benzylox- 
ycarbonyl-L-lysinate, substrate for a sensitive colon- 
metric assay for trypsin-like enzymes. Anal 
Biochem. 93, 223-236. 

Hadom, B., Tarlow, M. J., Uoyd, J. K. & Wolff, O. H. 
(1969). Intestinal enterokinase deficiency, lancet, i, 
812-813. 

Harel, M., Su, C. T., Frolow, F., Ashani, Y., Silman, I. & 
Sussman, J. L. (1991). Refined crystal structures of 
"aged" and "non-aged" organophosphoryl conju- 
gates of gamma-chymotrypsin. /. Mol Biol 221, 909- 
918. 

Haworth, J. C, Gourley, B., Hadom, B. & Sumida, C. 
(1971). Malabsorption and growth failure due to 
intestir\al enterokinase deficiency. /. Pediatr. 78, 481- 
490. 

Hodel, A., Kim, S.-H. & Briinger, A. (1992). Model bias 
in crystal structures. Acta Crystallog. sect. A, 48, 851- 
858. 

Jones, T. A. & Thirup, S. (1986). Using known substruc- 
tures in protein model bmlding and crystallogra- 
phy. EMBO /. 5, 819-822. 

Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. 
(1991). Improved methods for building protein 
models in electron density maps and the location of 
errors in these models. Acta Crystallog. sect. A, 47, 
110-119. 

KalafaHs, M. & Mann, K. C. (1993). Role of the mem- 
brane in the inactivation of factor Va by activated 
pmtein C. /. Biol Chem. 268, 27246-27257. 



Structure of Enteropeptidase 




373 



Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D. W. & 
Sadler, J. E. (1994). Enterokinase, the initiator of 
intestinal digestion, is a mosaic protease composed 
of a distinctive assortment of domains. Proc. Natl 
Acad. Sci. USA, 91, 7588-7592. 
Kitz, R. & Wilson, I. B. (1962). Esters of methanesulforuc 
acid as irreversible inhibitors of acetylcholinesterase. 
/. Bioi Chem, 237, 3245-3249, 
Kohnert, U., Rudolph, R., Verheijen, J. H., Weerung- 
Verhoeff, E. J. D., Stem, A., Opitz, U., Martin, U., 
lill, H., Prii\z, H., Lechner, M., Kresse, G.-B., 
Buckel, P. & Fischer, S. (1992). Biochemical proper- 
ties of the kringle 2 and protease domains are main- 
tained in the refolded t-PA deletion variant BM 06. 
022. Protein Eng. 5, 93-100. 
Krem, M. M. & Di Cera, E. (1998). Conserved water 
molecules in the specificity pocket of serine pro- 
teases and the molecular mechanism of Na"*" bind- 
ing. Proteins: Struct. Funct. Genet. 30, 34-42. 
Laemmli, U. K. (1970). Cleavage of structural proteins 
during the assembly of the head of bacteriophage 
T4. Nature, 227, 680-685. 
LaVallie, E. R., RehemtuUa, A., Racie, L. A., DiBlasio, 
E. A., Ferenz, C, Grant, K. L., Light, A. & McCoy, 
J. M. (1993). Cloning and functional expression of a 
cDNA encoding the catalytic subunit of bovine 
enterokinase. /. Biol Chem. 268, 23311-23317. 
Light, A. & Fonseca, P. (1984). The preparation and 
properties of the catalytic subunit of bovine entero- 
kinase. /. Bioi Oxem. 259, 13195-13198. 
Light, A., Savithri, H. S. & Liepnieks, J. J. (1980). Speci- 
ficity of bovine enterokinase toward protein sub- 
strates. Anal Biochem. 106, 199-206. 
Lu, D. & Sadler, J. E. (1998). Enteropeptidase. In Hand- 
book of Proteolytic Enzymes (Barrett, A. J., Rawlings, 
N. D. & Woessner, J. F., Jr, eds), pp. 50-54, 
Academic Press Ltd, London. 
Lu, D., Yuan, X., Zheng, X. & Sadler, J. E. (1997). Bovine 
proenteropeptidase is activated by trypsin, and the 
specificity of enteropeptidase depends on the heavy 
chain. /. Biol Chem. 272, 31293-31300. 
Maroux, S., Baratti, J. & Desnuelle, P. (1971). Purification 
and specificity of porcine enterokinase. /. Biol Chem. 
246, 5031-5039. 
Matsushima, M., Ichinose, M., Yahagi, N., Kakei, N., 
Tsukada, S., Miki, K., Kurokawa, K., Tashiro, K., 
Shiokawa, K„ Shinomiya, K., Umeyama, H., Inoue, 
H., Takahashi, T. & Takahashi, K. (1994). Struchiral 
characterization of porcine enteropeptidase. /. Biol 
Chem. 269, 19976-19982. 
Mikhailova, A. G. & Rumsh, L. D. (1999). Autolysis of 
bovine enteropeptidase heavy chain: evidence of 
fragment 118-465 involvement in trypsinogen acti- 
vation. FEBS Utters, 442, 226-230. 
Morrissey, J. H. (1981). Silver stain for proteins in poly- 
acrylamide gels; a modified procedure with 



enhanced uniform sensitivity. Anal Biochem, 117, 
307-310. 

Navaza, J. (1994). AMoRe: an automated package for 
molecular replacement. Acta Crystallog. sect. A, 50, 
157-163. 

Nelson, R. M. & Long, G. L. (1989). A general method 
of site-specific mutagenesis using a modification of 
the Thermus aquaticus p>olymerase chain reaction. 
Anal Biochem. 180, 147-151. 

Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein 
folding and association: insights from the interfacial 
and thermodynamic properties of hydrocarbons. 
Proteins: Struct. Fund Genet. 11, 281-296. 

Otwinowski, Z. & Minor, W. (1996). Processing of X-ray 
diffraction data collected in oscillation mode. 
Methods Enzymol 276, 307-326. 

Pace, C. N., Vajdos, F., Fee, L-, Grimsley, G. & Gray, T. 
(1995). How to measure and predict the molar 
absorption coefficient of a protein. Protein Sci. 4, 
2411-2423. 

Pavlov, I. P. (1902). The Work of the Digestive Glands, 
Trans, Charles Griffin & Co. W. H. Thompson, 
London. 

Schechter, I. & Berger, A. (1967). On the size of the 
active site in proteases. I. Papain. Biochem. Biophys. 
Res. Commun. 27, 157-162. 
Sheehan, J. P. & Sadler, J. E. (1994). Molecular mapping 
of the heparin-binding exosite of thrombin. Proc. 
Natl Acad. Set USA, 91, 5518-5522. 
Vijayalakshmi, J,, Padmanabhan, K. P., Mann, K. G. & 
Tulinsky, A. (1994). The isomorphous structures of 
prethrombin2, hirugen-, and PPACK-thrombin: 
changes accompanying achvation and exosite bind- 
ing to thrombin. Protein Sci 3, 2254-2271. 
Wilkinson, A. J., Fersht, A. R., Blow, D. M. & Winter, G. 
(1983). Site-directed mutagenesis as a probe of 
enzyme structure and catalysis: tyrosyl-lRNA 
synthetase cysteine-35 to glycine-35 mutation. Bio- 
chemistry, 22, 3581-3586. 
Wu, Q., Sheehan, J. P., Tsiang, M., Untz, S. R., Birktoft, 
J. }. & Sadler, J. E. (1991). Single amino acid substi- 
tutions dissociate fibrinogen-clotting and thrombo- 
modulin-binding activities of human thrombin. 
Proc. Natl Acad. Sci. USA, 88, 6775-6779. 
Yahagi, N., Ichinose, M., Matsushima, M., Matsubara, 
Y., Miki, K., Kurokawa, K„ Fukamachi, H., Tashiro, 
K., Shiokawa, K., Kageyama, T., Takahashi, T., 
Inoue, H. & Takahashi, K. (1996). Complementary 
DNA cloning and sequencing of rat enteropeptidase 
and tissue distribution of its mRNA. Biochem, Bio- 
phys. Res. Commun. 219, 806-812. 
Yuan, X., Zheng, X. L., Lu, D, S., Rubin, D. C, Pung, 
C. Y. M. & Sadler, J. E. (1998). Structure of murine 
enterokinase (enteropeptidase) and expression in 
small intestine during development. Am. /. Physiol 
37, G342-G349. 



Edited by R. Huber 



(Received 27 May 1999; received in revised form 23 July 1999; accepted 26 ]uly 1999) 





Exhibit 22 



Tbs JouiMAL or BnuociCAt Otnasrar 

O 1994 tiiy The A merte an Secie^x for P rorhr i i i i irtfy and Maleculnr Biology. Inr 



VoL 269. Ko. 31. Usus of Aoffust 0. pp, 19976-19982. 1994 

Ptinttditi UJBJK. 



Structural Characterization of Porcine Enteropeptidase*^ :;im7<is 

(Received for publication, March 8, 1994, and in revised form, April 11, 1994) 

Masashi Matsusliimat§» Masao Ichlnoset* Naohisa Yahagit, Nobuyuki Kakeit* Shinko T^ukadaf, 
Kazumasa Mikity KiyosHi Kujrokawat» Kosuke TashiroD, Koichiro ShiokawaD, Kazuko ShinomiyaO, 
Hideaki UmeyamaBt Hideshi Inoue§» Takajrukx Tiakakashi§y and Kez^ji Tedcaliashx§** 

From the %First Department of Internal Medicine^ Faculty of Medicine and the Departments of ^Biophysics and 
Biochemistry and MZoology, Faculty of Science, University of Tbkyo, Tbkyo 123 and the ^chctol of Pharmaceutical Sciences, 
Kitasato University, Tbkyo 108, Japan 



Enteropeptidase (EC 3.4.21.9) is a key enzyme in the 
intestinal digestion cascade responsible for the convert 
sion of trypsinogen to trypsin, which then activates vari- 
ous pancreatic zymo|fens. In order to structurally char- 
acterize the enzyme, we purified the enzyme from 
porcine duodenal mucosa and showed that it consists of 
three polypeptide chains, which we named "mini" chain 
(M chain), light chain (L chain), and heavy chain (H 
chain) in order of increasing molecular size. Based on 
their NH,-terminal sequences, a cDNA clone for porcine 
enteropeptidase was isolated and analyzed. The clone 
was 3597 base pairs long, which encoded 1034 amino 
acid residues of a single-chain precursor form of en- 
teropeptidase. The precursor contained an additional 
NH,-terminal 51-residue sequence including a putative 
internal signal sequence, followed by the M chain (66 
residues), the H chain (682 residues)^ and the L chain 
(235 residues) in that order. The H chain had regions 
partially homologous in sequence with low density li- 
poprotein receptor and complement components. On 
the other hand, the L. chain was highly homologous with 
the catalytic domains of trypsin-like serine proteinases. 
The structural model of the L chain suggests that the 
sequence, Arg*"-Arg-Arg-Iys"®^, is probably involved in 
the unique substrate specificity of the enzyme, prefer- 
ring acidic amino acid residues at the P,— sites. 



Enteropeptidase (enterokinase, EC 3.4.21.9) is well known 
and physiologically the only enzyme capable of converting tr\"p- 
sinogen to trypsin (1). TVypsin thus produced then converts 
various pancreatic zymogens including trypsinogen itself to 
their corresponding active enzymes. Therefore, enteropepti- 
dase has been recognized to play a key role in regulating in- 
testinal protein digestion. Indeed, patients with primar>* en- 
teropeptidase deficiency, a genetic disorder with no or little 
enteropeptidase activity in the duodenum, have been reported 
to suffer from malabsorption and malnutrition, particularly in 
infancy, and need to take drugs containing a pancreatic enzyme 
mixture for recovery (2). 

Because of its physiological importance, there have been a 
number of studies on the purification and characterization of 
enteropeptidase from various species (3-9). These studies have 



* This work was supported in part by grants-in-aid for scientific re- 
search from the Ministry of Education. Science and Culture of Japan. 
The costs of publication of this article were defrayed in part by the 
payment of page charges. This article must therefore be hereby marked 
'^advertisement in accordance with 18 U.S.C. Section 1734' solely to 
indicate this fact. 

The nucleotide sequencefs) reported in this paper has been submitted 
to the CenBanh'^/EMBL Data Bank with accession number<s) D30799. 

** 7b whom correspondence and reprint requests should be ad- 
dressed. Obi.: 81-3-5689-S607; Fax: 81-3-5802-2041. 



shown that the enzyme is classified as a trypsin^like serine 
proteinase having strict specificity toward substrates with a 
basic amino acid residue at the site' and acidic residues at 
the Pj^Pft sites as expected from the NH^'terminal amino acid 
sequence (Val'-Asp-Asp-Asp-Asp-Lys*) of bovine trypsinogen. 
In contrast, structural information on the enzyme is still lim- 
ited. Its molecular weight thus far reported ranges from 
150,000 to 300,000, depending on the difference in species. In 
addition, the number of constituent polypeptide chains has 
been reported differently; the enzyme was reported to be com- 
posed of two chains in pig (4) and cow (7, 9) and three chains in 
human (10). Available data indicate that in all cases the 
smaller polypeptide chain, called the light chain, is a catalytic 
chain (4, 10, 11), but the precise chain composition is not yet as 
clear. This is largely due to lack of information on the complete 
amino acid sequence of enteropeptidase, although the bovine 
light chain sequence has been reported very recently by 
LaVallie et aL (12). 

We have recently established a purification procedure for 
enteropeptidase from porcine duodenal mucosa and found that, 
unlike the previous data (4), the enzyme consists of three dif- 
ferent polypeptide chains, i.e. **mini" (M),* light (L), and heavy 
(H) chains. Furthermore, we have clpned and analyzed a cDNA 
coding for the protein and deduced its complete amino acid 
sequence. The results clearly indicate that enteropeptidase is 
synthesized as a single-chain precursor protein and then is 
processed to the mature enz3rme. In this paper, we describe 
these results and discuss the substrate specificity of the en- 
zyme based on the three-dimensional structure constructed by 
computer modeling. 

MATERIALS AND METHODS 

Determination of Protein Coneentration-^Protein concentration was 
estimated colorimetrically by using a protein assay kit (Bio-Rad) and 
mouse IgO as the standard (13). 

Enzyme Purification — Enzyme actirity was assayed essentially ac- 
cording to Liepnieks and Light (7) with some modification. The purifi- 
cation procedure will be described in detail elsewhere. In brief, the 
mucosa was obtained from 40 porcine duodena by squeezing them with 
the fingers in 20 mu TVis-HCl (pH 6.0). and the crude extract was 
obtained from the mucosa by solubilizing with 1% sodium deoxycholato 
followed by centrifugation. The enzyme weis purified from the extract by 
four steps of chromatography on columns of DE52 (5.4 x 40 cm. What- 
man), Butyl Tbyopearl 650S (2 x 20 cm. prepacked. Tbsoh). Sephacryl 
5-300 (3.6 X 90 cm. Pharmacia Biotech Inc.), and benzamidine-Sepha- 
rose (0.9 x 25 cm. Pharmacia). The enzyme'fVactions obtained from the 
last column were pooled, concentrated, and used for further experi- 
ments. 



* The nomenclature is accorxling to Berger and Scbechter (60). 

' The abbreviations used are: M chain, 'mini* chain; L chain, light 
chain; H chain, heavy chain; LDL. low density lipoprotein; PAGE, poly- 
acrylamide gel electrophoresis. 



19976 



it : 



Structure of Porcine Enteropeptidase 



19977 



Tabi^ I 

Purification of porcine duodenal enteropeptidase 
EKU is defined as nanomoles of trypsin produced in 30 min at 37 "C. 



01 igonucleotide 
Probe (53bp) 



487 539 



Step 


Tbtal 
protein 


Tbtal 
activity 


SpeciHe 
activity 


Yield 


Puriiication 






mg 


EKU 


EKUIme 


« 


•fold 


E K R- 1 








protein 






(l.2kb) 


Crude extract 


4.730 


157,000 


33.2 


100 


1 




DE62 


304 


6B,300 


192 


37.1 


5.8 




Butyl. Tbyopearl 


35.2 


27,600 


720 


17.5 


21.7 




Sephacryl 3-300 


2.94 


13,300 


4.530 


8.5 


136 




Benzaxnidine- 


0.42 


10,000 


24.200 


6.4 


729 


E K R - 2 


Sepharose 












(0.6Kb) 


a 




b 




c 








200 K — 

I 

116K — 
97 K . 



<H>— 1< 



66 K 
^5K 



97.4 K 
662 K 

*a0K 
31.0K 



M 



4BK 
CD 



t6-19Kr 
(M) L 



21.5K 

](M) 



— (M?) 



Fig. 1. SDS-PAGE patterns of the purified eazyme. a. under 
reducing conditions using a gradient gel of 4-20%; 6, under reducing 
conditiona using a gradient gel of 15-25%; c. under nonreducing condi- 
tions using a gradient gel of 15-25'*. Approximately 30 pg of the en- 
zyme was applied to each lane. 



Polyacrylamide Gel Electrophoresis — Polyacryl amide gel electro- 
phoresis (PAGE) was perforrned essentially according to Laemmli (14) 
using SDS-PAG plate 4/20 and Multigel 15/25 (Daiichi, Tbkyo). 

NH, -terminal Amino Acid Sequence Analysis — The purified enzyme 
sample was subjected to SDS-PAGE using 4--20 or 15-25% gradient 
gels, and the separated polypeptides were transferred to Immobilon P 
(Millipore) or Immobilon p^ (Millipore) essentially according to 
LeGendre and Matsudaira (15). The proteins on the membranes were 
analyzed with an automated protein sequenator (model 477A, Applied 
Biosystems) on-line to a phenylthiohydantoin-derivative analyzer 
(model 120A. Applied Biosystems). 

cDNA Cloning and Analyses — The total RNA was extracted from 
freshly resected porcine duodenal mucosa by the guanidium isothiocya- 
nate method and purified by CsCl density gradient ultracentrifugation 
( 16). The poly<A) RNA was isolated using Oligotex dT-30 super (Takara). 
Complementary double-stranded DNA was synthesised using a cDNA 
synthesis system plus (Amersham Corp.* from 5 pg of the poly<A) RNA 
as a template with oligo(dT} or random hexanucleotide as a pnmer(17). 
The cDNA libraries were constructed using a cDNA cloning system 
(Amersham Corp.). except that AZAP II/fcoRI vector (Stratagene) was 
used. A 53-mer oligonucleotide described under Tlesults" was synthe- 
sized by Sawaday Ibchnology (Tokyo). The probe was labeled at the 
6'-end using (t^"P1ATP (6000 Ci/mmol. Amersham Corp.) and a Mega- 
label labeling kit (Amersham Corp J. The DNA fragment probe was 
labeled by the mulUprime method using (a-'*PldCTP (3000 Ci/mmol, 
Amersham Corp.) and a Megaprime labeling kit (Amersham Corp.). The 
transfer membrane used was Hybond N (Amersham Corp.). and the 
conditions of transfer, fixation, prehybridization, hybridization, and 
wash were essentially according to the manufacturer. For the 53-mer 
oligonucleotide probe, 45 *C was adopted as the temperature of prehy- 
bridization and hybridization, and 2 x SSC and 0.19b SDS at 60 **C as 
the stringent wash conditions. The cloned cDNA in the vector was 
automatically subcloned'to pBluescript phagemid. and double-stranded 
DNA in the phagemid was used as a template for DNA sequencing. DNA 
sequencing .was perfonsed by the dideoxy chain termination method 
(18) using a Thq dye primer sequencing kit (Applied Biosystems), a 
thermal cyder (model PJ 480, Perkin-Elmer). and a DNA sequenator 
(model 370A, Applied Biosystems). 

Computer Modeling of Three-dimensional Structure of L Chain — A 
homology search for the L chain was -performed in the Brookhaven 
Protein Data Bank by the multiple alignment system for protein se* 



E K - 2 
(3.6kb) 



£17/11 HindiW fccRI 

\ boOO / 2000 

r \V — " 



Hind 



Pstl 

VOOO 3597 
J 



poly-A 



E K - 3 
(3.6Kb) 



Zni Hindi U ice 

/ \ / 

HtncA I fcdev asHI 



2000 



\^ 3597 



E K- 7 
(2.9Xb) 



HindiU fctfil 



2000 
■ 



Pst\ 

yx)o 



— J 

poly-A 



3597 



e K - I 
(3.0kb) 



NinA I 



I 

000 



£cdk\ 

I 



2000 



fiLssHI 



Pst\ 



poly-A 



3597 



pol y-A 



Fig. 2, Restriction enzyme mapping of the cDNA clones. The 
base pair numbers are according to the numbering of the longest clone. 
EK-2. EKR-1 and -2 were positive clones in the random-primed cDNA 
library, while EK-2. -3, -7, and -11 were positive in the oligo(dT)-primed 
library. All clones had the same map except for an EcoRl site in EK-7. 

quences (62). Comparing the sequences of the 28 most homologous 
proteins of known three-dimensional structure with that of the porcine 
L chain, the L chain was divided into 13 parts so that each segment had 
a similar deletion and insertion profile. For each segment, one protein 
was selected from the homology list so as to minimize insertion and 
deletion and to maximize identity. Thus, a chimeric reference protein 
was constructed that was composed of the following segments: IHNE 
(human neutrophil elastase) for positions 800-^14, 815--825. and 839- 
856; IDWB (human thrombin) for 826-838 and 869-892; 3RP2 (A chain, 
rat mast cell protease II) for 857-868; 4CHA (A chain, bovine o-chyroo- 
trypsin) for 893-930, 988-1003, and 1018-1034; 3EST (porcine pancre- 
atic elastase) for 931-944; ISGT iStreptomyces griseus trypsin) for 945- 
971; and ITLD (bovine ^-trypsin) for 972-987 and 1004-1017. Gly*^ 
and Arg*** were inserted into the reference protein IHNE by using the 
coordinates of the main chain of Gln-Arg of Leu*^-Tyr-Gln-Gln-Arg- 
Asp-Val-Asn^of 6TIM (triose-pKbsphate isomerase). The three-dimen- 
sional modeling of the L chain was performed using the chimeric protein 
as a reference protein according to Kcuihara et al. (19). Modeling of the 
complex of the L chain and Val-(Asp)4-Lfys was also performed with the 
above structural model as a base protein using the coordinates of the 
main chain of Lys^-Pro-Ala-Cys-Thr-Leu" of the inhibitor part in 3SGB 
in protein data bank code (proteinase B firom S. griseus complexed with 
the third chain of turkey ovomucoid inhibitor) for the initial arrange- 
ment of the hexapeptide. essentially according to the same method. 

RESULTS 

Purification and Structural Characterization of Porcine 
Enteropeptidase-^From 40 porcine duodena, 0.42 mg of the 
purified enzyme was obtained in a 6.4% yield with 729-fold 
purification (Table I). The molectilar weight of the enzyme was 
estimated to be approximately 200,000 by gel filtration (data 
not shown). As shown in Fig. la, SOS-PAGE using a gradient 
gel (4—20%) imder reducing conditions gave two polypeptide . 



■i 

I 

i 



• •1 

MM 

ItiV 

i 

I IT. • 

' hi 



iiii 

.hii 

t . ■: 



•J 



t i 



-J 



19978 



Structure of Porcine Enteropeptidase 



Fio. 3. The nucleotide 'and the de- 
duced amino acid sequences of the 
cDMA clone EK-2. The boxed amino acid 
sequence is the hydrophobic segment pre- 
sumed to be an "internal signal se- 
quence." Underlines with (a), (6), and (c) 
indicate sequences that agreed with the 
NHs-terminal amino acid sequences de- 
termined for the M, H, and L chains of the 
mature enzyme, respectively. The under* 
line at base pair numbers 3559-3564 in- 
dicates a polyadenylation signal. The resi- 
dues in white letters are potential 
asparagine-1 inked glycosylation sites. 
The double underlines indicate Ser/Thr 
clusters as potential mucin-type glycosy- 
lation sites- Residues with below in- 
dicate the enteropeptidase catalytic triad. 



tr 



til 
■ I 



•J 



•It 



III 



Itff 

III 



tia It* tea tic Ml Clf IM 111 C*B UC MC fCC ICI IC* tic •<! CC* US tl* ttC ICI t1* *«« «*l Ctl «!• M« AtC Ctl 

Ca4 at4 ct* CM ft* aas CCa *tl tie C«i tci ttt Ctl CCt tci |ic •» iCC III \lt Mt Ml 

Mt ■•*-tta at* Bar ■•• ••' i** **• 



«c* cce ci< III tec »i« m 

it, li2 1.1 ii; III II, i;a 



ait an ctt iti tci C«a if* an et a tn ICC ttt lt> aca *fC iat ti« tea «aa aa* tat CM Ua Cll U* aaa alt ««C Caa CCS ac« CM 

■ ^ T i r *- - ' T > * * ' ' ^'' T^ *»• *'• \n *'* *" 

ic* ate aaa ara act ict cc* «tt aca lai aat cct aai tit caa lac aat cii tea cis tat lie tti ctr cci iti c*e an cat eaa art 

i>« ».t ■.. 11. I., ■■■ ..t i » a.. 1., si« a.» i,« I.. t«f i.l a,, »m Im lai l»* »!• »** a.a il. (t » 

ai* est tat ate ttt Caa tea aei att cit aat aai taa tat aaa aat tea ata til cti Cia tti aaa aai CCC act tii ata ate at« iti cac 

Mt Cl| <!• |t« rka aia tar Kmm Iti ats tla l»« If* ••• i*« «l* •*• ID ** r *'* "* "* ll»^**-a»» 

Cll etc tit tee iU tec ttt tea tat caa aat tia aaa tia tat tit ait caa c«e ait taa cca aai taa tee ace eaa eta tic tea iic cat 

1.. iM, »>. a iii tf i» «.i i.r a.. CI- a.» II. t.. et« ti« f - n. tlit «l. t tta OB i»» I" * 

ait cat «ra aai ate all tat ttc aCa cat tet lit taa aat tac tci aca aCa act CCt tea act act tea tac aat Cta aca acc acc acT cct 

Itt aaa a** l«r lla lla l»* lla lar «t« 09 *tr *" 'V ]** ^rt l»« l>^£^£^^ 

cca CC* act cca cca aat tic ict ata aat Itt cia cci ttt tea act cct ttt CCC tai cca cti taa rtt ata aca |ii tti tia in iti itt 

rra ai« l»r *ra tir D *•* tta fiU Ctt taa tt* t*r art Ct* tl* a** tia i** \f» Cr* II* *l* *«■ ••• !•« Ct* *«■ 

Cta caa tta aac ttt cca cat ctt iCt lai taa tiC ati at* att tit tec aCI CCt ttt Itf ata aaa ttt lit CTI act taa tct tei tit iCC 

Ctl ti» iia a** Ct* »»• *»m tif l*f a*» tu a*a l«f it* Cm ai* 1»» ai* Ct* a,» ait ii« *»• t»» «i- »•* ei» i»r 

ttc tat ict cca cat tat cc* aat Ctt tCt taa tea ttC ttt ttt tec cat til ati ait Cat tit aae taa tta ttt tec an tia cta ite 

a«* al* ala Cl« ttr »r« ir* l«r «!• ala t«t >*l t*l Ct* tt* ti» It* n* tr« 1*1 t** tta ttr I** >•' t>* l*w SB 



•ai 
tci 


act 

l«r 


tat 

lr» 


III 


att 

* *a 


aca 
1 ar 


Itt 

If 


let 

t*r 


ait 


lat 
a** 


• ta 

tat 


It* 

iaa 


aat 
• *• 


ttt 
II* 


ttt 
It* 


caa 
It* 


til 
tif 


til 
«*• 


ctl 

Ht 


tea 

lar 


aic 
l*t 


ttt 

It* 


att 

M* 


Itt 

i*a 


aaa 
*'e 


act 
tl* 


tci 

lar 


CIC 
l*a 


ICI 
Ir* 


lla 
ia* 


• It 
•at 


• it 

• •■ 


CCI 
tr* 


• *tt 
tit 


tea 
Ct» 


aca 

ihf 


an 
11* 


tCI 
tr. 


at 1 
II* 


III 
*a* 


ICC 
tat 


aat 
t*> 


cat 
tia 


tn 
«*i 


tet 

lar 


CIC 

••1 


acc 

lar 


ni 
»a* 


CM 
laa 


aia 

M* 


eta 
Cla 


ICI 
l*r 


cat 
• t* 


ttt 

tia 


aat 
a»a 


Cat 
a** 


tat 
ttr 


ail 
It* 


tti 
lit 


lit 
*a* 


tti 

SD 


CC* 

at* 


■Cm 
lar 


lai 
ttr 


act 

tir 


cca 
It* 


Mil 

ita 


tti 
»b* 


a*c acc 

as I** 


*c* 

lar 


cat 
tt* 


CIt 
!*• 


a*a 


aai 
a*a 


• at 
a** 


ttC 
tt* 


aaa 
it* 


aie 
na 


aae* ICI 
•a* Ct* 


aat 
a*a 


Mt 
*»• 


cat 
Cla 


Ctl 
at* 


cce 

Cll 


Itl 
raa 


tee 

Cr* 


ttc 
»•* 


tea 

tr* 


ate 
II* 


etc 

itm 


Cat 
ta» 


cr* 

i*a 


**i 
*•■ 


tat 
aia 


Cal 


*•! 
ata 


tta 

tia 


ir*t 

Ml 


*ct 

i.a 


ttt 

cia 


act 
ara 


tn 
II* 


cat 
tt* 


cct 

CIt 


•CC 
tar 


acc 
la* 


III 
• a* 


CCI 


CCC 
»r» 


tn 


*ei 
tar 


cca 

eit 


cca 
»»• 


aat 
a** 


III 
taa 


C*C 
a** 


cac 
• •* 


act 
1 mt 


tn 
raa 


ccc 
ctt 


1*1 

tBD 


cet 
at* 


tea 

la* 


• Ct 

ttt 


III 
taa 


tie 
i*» 


an 
Ita 


tec 

l*r 


acc 

lar 


cca 
t*a 


11*1 


act 

1 Hf 


CC* 

tit 


cca 
rta 


tea 

CI* 


cct 
tit 


tea 
ara 


caa 

CI* 


aaa 
11* 


aca 
*rt 


tta 
fat 


cc*a 
ttt 


en 

i*a 


ttt 
C*a 


•CC 
lar 


CIC 
iaa 


cct 

►r* 


It* 
iaa 


Ca* 
Cta 


cet 

tr* 


act 
tar 


nt 

iaa 


caa 
tta 


ct* 

tra 


tic 
•al 


ICC 
Cta 


ctl 

iaa 


act 

l*r 


tic 

t*a 


tec 

tra 


lai 

Ifr 


III 
l»r 


Its 
aat 


tail 
«•» 


lai 
tt» 


CCI 

ctt 


tta 
tl* 


tac 
ata 


etc 
t«i 


tat 

ttr 


aaa 
It* 


na 
i** 


acc 

l*r 


an 

n« 


aat aic 
BS tl* 


acc 
t*r 


aai 
a** 


cae 

a«a 


Cta 
CI* 


ate 
ata 


alC 
•*i 


ett 
CI* 


aai 

ita 


all 
M* 


• 11 
lla 


tie 

raa 


caa 
tta 


aat 

ita 


la* 

Ha 


Itt 
lit 


Itl 
tta 


III 
Iir 


tta 
tif 


tt* 
Cla 


ttt 
• t* 


list 
«tl 


ICt 

i'» 


ate 

aia 


1 at 

If 


tta 
t>( 


caa 
Cia 


cia 
• •1 


*c* 

Ihr 


na 

t*a 


aat caa 

m II* 


aca 
i»t 


tit 

f*l 


caa 

Cla 


ttt 
ra* 


tat 
it* 


ttt 
**l 


cet 
tt* 


nt 
•a* 


tti 

tta 


cca 
tl* 


III 
Paa 


aaa 
tt* 


•u 
at* 


cat 
tl* 


Itl 
»*• 


etc 

laa 


alt 
••r 


ctl 

It* 


alt 

lla 


tei 
ata 


Ml 
i*a 


tn 
Ita 


llJt 
Iff 


etc 

a*« 


an 
II* 


*CC 
l*t 


cta 

!•< 


aca 
lar 


Itt 

lir 


cct 

tl ( 


an ICC aai Ctt 
It* Ct* QQ t*t 


• Ct 
l*r 


Ctt 
ia* 


1*1 
ttr 


CC* 
tra 


tta 

61a 


cct 

•r* 


aet 
lar 


ttt 

l*B 


etc 
(*t 


cca 
Pta 


act 
la* 


tet 

tar 


CCI 
tr* 


cca 
tra 


eat 

tta 


Cll 


tet 

tr* 


act 

tar 


tac 
• *• 


tCI 

eti 


cca 
Clf 


tin 
ni 


CCt 

Clf 


cct 


lit 
ta* 


eaa 
Cia 


eia 

ia« 


let 
If* 


Cat 

Cla 


cca a«l 
frm US 


aca 
tar 


act 

lar 


Itt 
ra* 


act 

lar 


ICI 
■ *r 


• It 
aat 


ate 
a** 


ttt 

ta* 


CCI 
»r* 


tte 

tta 


• ac 
ata 


tac 
ttr 


CCS 
»r* 


aai 

ata 


etc 
ei* 


tct 

ata 


nc 


Itt 

en 


an 
ttt 


ICI 
tra 


aai 
**a 


tn 

!*• 


• It 
tia 


• ill 

n* 


cct 

al* 


eaa 
Cl» 


aaC 
if* 


cca 
t»t 


aaa 
if* 


aat 

ata 


aia 
II* 


etc 
tl* 


ctl 
iaa 


cat 
■ 1* 


III 
ra* 


taa 

Cla 


cat 

tta 


111 

r*« 


etc 
*■• 


lit 

i*a 


cat 
Cla 


ate 
at* 


tit 
n* 


cca 
tl* 


tat 
ata 


Ctt 
t*l 


Ctt 

•*l 


caa 

Cla 


ate 
It* 


tct 
*•• 


cat 
at* 


CCS 

(tl 


lit 

eta 


tat 

cta 


ttt 
• tt 


ItC 
It* 


tin 
•c» 


tea 
I** 


nt 

l*a 


etc 
i*> 


Ml 
l*» 


CCI 

at* 


etc 

• al 


lac 
It* 


aca 
lar 


tCB 
tit 


eet 

•ra 


ett 
ctt 


cca 
tra 


Cta 
lal 


cat 

lla 


ttt 

a>a 


ett 

v*t 


IIC 
**• 


tct 
t*r 


acc 
lar 


ace 

tbr 


aae 

ata 


ata 
tra 


• It 

■at 


act 
tar 


ttt 

t*i 


ett 

iaa 


lit 

*»a 


lie 

II* 


tct 

lar 


aai 
a** 


tn 
It* 


tea 
II* 


fttr 
tit 


ttc 

Ian 


aca 
tar 


aaa 
it* 


cce 

Clf 


Cta 

«t| 


ill 
»*• 


aaa 
it* 


tea 
ai* 


«tl ttc 

{B '*a 


aca 
tar 


act 
tar 


CCC 
Clf 


Itt 
ttt 


etc 

■ la 


ttt 

!*• 


CCS 

ctt 


an 
It* 


cct 
n* 


Ita 
CI* 


CCC 
tra 


let 

Ct* 


aat 
it* 


caa 
aia 


tac 
a** 


lal 
• •■ 


ttt 
»*• 


ctt 
tt* 


Itl 
Ctl 


tat 

tt* 


att 
aia 


CC* 

ttf 


tti) 
II* 


cat 
Ct* 


tCI 
Ct* 


etc 
t*t 


CIt 
i*> 


etc 
l«* 


tta 
•*■ 


aat 
t** 


etc 

i*a 


ICI 
Ct* 


cat 
a** 


Ctt 

Clf 


III 
vta 


tea 
tat 


c*e 
>i * 


tct 

Ct* 


aae 
tt* 


cat 
an 


cct 
ctt 


tct 

l*r 


eat 
at* 


taa 

tl« 


cca 
ai* 


cat 
■t* 


tee 

Ct* 


tit 
• *l 


cca 
ara 


tit 

to* 


n* 

t*a 


• •1 
S9 


tec 
tit 


act 
lar 


cca 

• la 


tic* 
III 


ate 
SD 


atC 
««« 


act 

tar 


cct 

CIt 


ttc 
l*« 


ttt 
«*t 


cat 

CI* 


lie 

ra* 


tea 
I'a 


ate 
1 la 


cat 

Cta 


ace 
lar 


aia 
II* 


tee 

Ira 


etc 
it* 


aca 
lar 


CCI 

ai* 


161 
Ct* 


CCI 
tl« 


Its 
ti« 


aac 
SO 


ICt 
Ir* 


ace 

• ar 


acc 
lar 


cat 

tta 


tci 

lar 


tea 

Iir 


ttt 
tt* 


eat 
•tt 


tte 
«*t 


• tl 
Crf 


CtQ 
II* 


till 

tl) 


eit 
i*» 


etc 
%•* 


tea 

tlf 


ctt 

l*a 


cct 
eit 


aet 
tar 


eta 
tif 


ttc 
BB 


tea 
l«r 


tee 
i*« 


• It 
**t 


CCC 
rr* 


nc 
**■ 


tt t 
aaa 


ICt 

I*' 


act 
If r 


CE* 

• i| 


C3I 

e't 


cet 

ttt 


cca 

tra 


Itl 
#aa 


Cta 
«*• 


•aa 
it* 


Cta 

ita 


aac 
a*a 


act 

lar 


tea 
au 


cct 

tra 


• •I 

as 


ccc 
lit 


act 
l*r 


na 

l*a 


t*i< 
III 


at* 
II* 


CI* 

t** 


tea 
lar 


eet 
tt* 


act 

l«r 


tt* 
CI* 


cac 
CI* 


ice 

Ct* 


tn 

taa 


etc 

El* 


Ctt 

a*a 


tc* 
Iff 


CIC 
i*a 


• It 

n* 


nt 

i«a 


II* 
«*■ 


ctl 

Cla 


tci 

Cr* 


• aC 
ata 


Ctt 


• at 
it* 


tci 

■ •r 


III 
Ct* 


cta 
ctt 


aaa 
if* 


ttt 
itl 


CIt 
lla 


tie 
t.i 


eet 

tl a 


cat 

eta 


en 

Cla 


lie 
1*1 


tut 
III 

tilt 
•It 


act 
lit 

etc 

LU. 


et* 
*f 

tic 


ate 
iia 

ace 

t*r 


III 


Cll 


tct 
SU 


cca 
^U. 


tti ctl 

tn% t** 


ICG 
-lAI. 


aca Cta cca ccc icc cce 

tra Cla CI > *l ■ Ira Pr* 


ICC 
.1X1. 


CM 
-UJ. 


en cct 

f*i ai> 


cte 


lai 
Ju. 


tac 
Ml. 


ate 

Jkia. 


cct 


Cit 
ftlA 


etc 

AtA. 


ci; 
ita. 


ICC 


cca ett 


<ec 
Oii 


\^ 
aci 

tra 


etc 

at* 


tct 


Ctt 

i** 


etc 
«*i 


tec 

»*' 


cei 

aia 


CCC 
ala 


cac 
111 


ic: 

Ct* 


tic 
«*t 


Itl 
ttr 


CCC 
Clf 


act 
• ri 


• al 


Ita 
tf a 


etc 
eta 


cca 
• ra 


ICI 
tar 


ata 
if* 


ICD 
Itp 


• aa 
it* 


ttt 
tia 


tic 
It* 


tia 
iaa 


cce 

CI t 


CM 
i *a 


CIC 
■ tt 


tic 
**t 


till 
lit 


act 
la« 


ice 
i*t 


t* 1 

on 


CIt 
t«« 


act 

tar 


tct 

l*r 


cca 

rra 


eat 

lla 


ata 
It* 


Cla 
»*l 


aCI 
lar 


cct 
»'l 


tic 

i*a 


tic 
II* 


ttt 
t*» 


ea* 

Cla 


• It 
II* 


etc 

?*t 


an 
It* 


ttc 


CC* 
tr* 


e*t 

*t* 


tac 
ttr 


ate 
*•■ 


at* 
trt 


tsc 

tr| 


ttt 
art 


lie 
it* 


etc 
tl* 


lei 

l*r 


CiC 

\U 


tic 

lla 


Ifll 
II) 


cce 
a I* 


tie 
•■t 


lis 
a* f 


eai 
■ •* 


en 
i*- 


Bit 
Cla 


rtt 

taa 


aaa 
it* 


CIt 
»•« 


ttc 
SB 


ttc 

ttr 


*Ca 
ta> 


eat 
tta 


tac 
rtr 


aia 
11* 


cat 
CI* 


CCI 
*r* 


tl 1 

n* 


tCl 

Cn 


lla 
ita 


CCt 
tr* 


caa 
Cla 


caa 
tta 


aat 
a*a 


tat 

tl a 


til 
ft) 


(11 

ta* 


cct 

tra 


CCI 
tra 


cet 

CIt 


tct 

•r| 


Itl 
Ma 


nil 
111 


III 
Ct* 


lEt 
l*r 


III 
It* 


tci 
*i* 


cec 
ttf 


tct 

Ira 


cct 

11* 


aaa 

iT* 


tn 

f •! 


tia 
!•• 


lat 

In 


Cta 

cta 


cet 

CIt 


let 

i*r 


CCI 
tra 


eet 
at* 


etc 

It* 


•tt 
iif 


etc 
I** 


Ct* 

tl* 


Ctt 
Cla 


CCI 
■ 1* 


Ctl 
at* 


ei 1 
«*i 


tec 

tra 


Cll 
1 ta 


CIt 

\** 


It* 
tfr 


lai 

• •a 


etc 

Cla 


• •• 

iTt 


ICC 
tr* 


lift 
111 


e** 

ClB 


etc 

cia 


C*C 
lla 


iri 

art 


CC* 
*r« 


«** 

Cla 


lat 

t»r 


ate 
39 


• n 
1 1* 


*CI 
lar 


Cit 

Cla 


• tl 
tta 


tic 
•* 1 


tie 

• at 


ICt 
Cf* 


cca 
tl* 


ccc 
tit 


Itl 

tr' 


ett 

Cla 


Ctt 
CI* 


est 

lit 


CCI 
CIt 


aia 

iif 


eai 
*•* 


ICI 
tar 


ISI 
(»• 


Ct: 
11* 


CCC 
Of 


tat 
Ita 


tct 

Itt 


Ctt 
CIt 


Cta 

CIt 


itri 
tl) 


CC* 
».* 


ei* 
i»« 


tit 
*■ 1 


ICG 
Cr* 


Cll 

it* 


ct* 

Cla 


tac 
tao 


aac 
ata 


•tt 


let 

Ira 


etc 

iaa 


CIC 
i*a 


CCI 

• I* 


CCI 
Mt 


CIC 
**l 


«c* 

lar 


tct 

lar 


1 1 1 
tn* 


cc* 

Clf 


Ite 
It* 


etc 
tl* 


ICI 
Ct* 


ct* 
tl* 


cie 

iaa 


cct 

• r * 


t«l 


tee 
t't 


tea 
• 'a 


cct 
I* t 


Ctt 
*f 1 


• tC 
ttr 


etc 

• la 


iiat 


etc 


etc 


etc 


t*S 


1 IC 


te* 


c*e 


tct 


tl* 


etc 


act 


III 


etc 


(tl 


lie 


act 


IIC 


Ctt 


ttt 


ttt 


IC* 


ICt 


• tt 


IC* 


ccc 


tst 


1 1 1 


tte 


It 1 


Ctt 


CM 


1 1 • 


tilt 


*Ci 


tec 


tit 


t** 


til 


etc 


tea 


ni 


ctt 


ttt 


ate 


ICI 


CIC 


1 1 1 


• tc 


cct 


Itt 


CIt 


M t 


ctl 


tic 


CIt 


Cll 


tta 


Ite 


IC3 


ct: 


CI* 


Itl 


Itt 


ttt 


1 1 • 


nil 


c** 


*ei 


t*C 


lie 


tt* 


III 


IC* 


t:i 


III 


nc 


tti 


cee 


iti 


ICI 


1 IC 


at* 


etc 


ft 


an 


tl 1 


tte 


1 1 1 


ctl 


ICt 


t*i 


Cll 


lit 


tn 


cat 


etc 


Ml 


lit 


Kit 


c** 


tct 


l*t 


in 


*•• 


cca 


Itt 


act 


• tt 


ttt 


CM 


1 *t 


CCt 


ttt 


Cla 


CC* 


*•• 


CM 


1 1« 


CC* 


at* 


CIt 


cat 


at* 


etc 


in 


•t: 


(la 


1*1 


1 II 


ttt 


•Ci 



•I 

tit 
tl 

tti 
11 

III 
II 

«•• 

III 

Itt 
Itt 

in 
lit 

ti* 
Itl 

••I 

lit 

• ta 

111 

itil 

in 

• nt 
)*• 

ini 
Itl 

• tti 
tn 

titc 

• M 
ttll 

n« 

1*11 
lit 

nil 
tn 

III* 
tn 

mt 
tt* 

ftii 
111 

titi 
III 

ttti 
tti 

lia< 
111 

itti 
la* 

itf< 
ni 

lit: 
111 

nti 
til 

Ml' 

tt; 

lla- 

• t 

III 

• i< 

tif 

• * 

>it 
■ •t 

trt 



lit 
Itt 
Itt 



lltl CM tta ti l til 



bands with ^ 152.000 (H chain) and =» 48,000 (L chain). 
In addition, a cluster of bands was reproducibly observed near 
the dye front, which we named "mini" chain (M chain). The M 
chain was shown to be composed of five or more separate 
polypeptides with Af, = 16.000-19,000 when analyzed using a 
15-25^f gradient gel as shown in Fig. 16. Upon SDS-PAGE 
under nonredudng conditions* the purified enzyme produced 
the M chain bands and a polypeptide band with M, = 200,000 
(Fig- Ic). Therefore, we concluded that the purified porcine 
mature enzyme is composed of three different polypeptides, the 
H. L, and M chains. The former two chains are associated 
covalently with each other, while the M chain is bound to the H 
and/or L chain non-covalently. 

The r^fHg-tenninal amino acid sequences of the H and L 
chains of the enzyme were shown to be SVTVTFDLi-FAQWVS- 
DENIKEEUQGIEA (29 residues) and IVGGXDSREGAXPXV- 
VAXrYYNGQLLXGASLV (31 residues), respectively. For the M 
chain, the analyses of the three bands electrophoretically sepa- 
rated on SDS-PAGE resulted in the same sequence of LGKS- 
HEARGTMKTTXGVTYNPNL (23 residues). The molar ratio of 
-the H, L, and M .chains in the enzyme estimated from the 
amounts of phenylthiohydantoin-derivatives obtained by NH^- 



Table II 

Comparison of the molecular weight of each chain calculated from 
the deduced amino acid sequence with that measured by SDS-PACE 
and the numbers of potential asparagine-linked glycosylation sites 
The molecular weight was calculated assuming that no more process- 
ing occurs in the COOH-terminal region of each chain. 

Molecular weight. kIC Number of poteaUal 

Calculated ^y^^^Si^n trites 



M chain 


7.6 


16-19 


1 




H chain 


75.4 


152 


17 




L chain 


26.4 


48 


4 





terminal sequencing was approximately 1:0.6:0.7 on average. 
CTonsidering the variations in the jaeld of each chain and phen- 
yl thiohydantoin-derivatives in the analytical procedures, this 
is taken to in<licate that the three chains are associated in an 
equimolar amount to form the enzyme. 

Isolation and Characterization of Porcine Enteropeptidase 
cDNA Clones- — Based on part of the NH^-terminal sequence of 
the H rhnin (Phe^° to lie"), we designed a 53-mer oligonucle- 
otide probe including 16 inosines, 6-fold redundant and comple- 



Structure of Porcine Enteropeptidase 



19979 



Enteropeptidase (porcine) 
Enteropeptidase (bovine) 
Hcpsin (hman) 
Plasna Kellikrein (huaan) 
Factor XI (Hnan) 
Trjrptasa (dog) 
Trypsin (bovine) 
Chytaotrypsin (bovine) 
Elastase (porcine) 



800 

IVGQIOSeEG 

t«ii«R«TSLs 

•tt*Tii«Skr* 

•••«TA*VB« 
ttttBEAPCS 

•••trrcGAN 

•MtEEAVP* 
Vti«TEAQKN 



AWPWWAIYY 
Et««0»&*W 

E<««on«KT 

K«*tO*S»RL 
TV*TQtS»-ll 
St«tOs$cQ0 



820 

NQO LlCCASt 

00*— <7V***<* 

Dt* H**tG** 

KLTAQB-H*«*Gt« 
TSPTQS-H*««G*I 
Kt«T¥S-H]*«G«* 

sn KF««e»» 

KTCF KF»tC»» 

RSGSSVAHT««GT* 



VSBOVLVSAA 

tao*viT«* 

IGNO*lLT«t 
IKPO«VLT«« 
]KSO«Vtt«t 

]KEN«V«T«t 
IBQNtVMT** 



840 

DWYCaiLEP 

frFPE«»irVl 
QiFD«LP*0- 
3»F»»— V»S 
3r«GPj|WC« 

g»Y K 

3»G VT 

3»» — OB«L 



a*««tV**t« 
sBsffVFAc— 
DV«8|YS«U 
P«ILeVYSCI 
E£tRVO«B— 
«GiaVR««QO 
TSOWVAt-£ 
TFDVWGEK)! 



860 

MTsarsrai 

-AVAOA*tHG 

ia*Di«u)pp 

IHQSEIKEOT 
-EQH*YYQOH 
-NMWGGNO 
FOQCSS*EJ(t 
LIO«0CTE«Y 



VTRIIOEIVI 
E*«*«tO*«* 
LQLCVOAVtY 
FSO-«K««I« 
SFFCVO*«l« 

QFISASKSIV 
(KLKtAXVFK 



8S0 

WW NBRSXO 

—•IC««»H 
HQC*LPFRDPiS£ENS 

mx* KVSECH 

HDO» KMAESC 

KtN> rrpEHC 

HtS* tSNTLN 

•SK« tStTIN 

K»TW KTOOVAAC 



Enteropeptidase (porcine) 
Enteropeptidase (bovine) 
Hepsin (hunan) 
piasoa Kallikrein (husan) 
Factor XI (hunan) 
Tryptase (dog) 
Trypsin (bovine) 
ChyoDtrypsin (bovine) 
Elastase (porcine) 



900 

S^IAWtfLEF XVtrYTDYIGP 



020 



940 



|Qr«LV*«SS 

A3>iU£«tO 
fC*TlU»ST 

y3»*ur*ao 



PLPl*E»»»* 
Pt«*tEFQK« 
T«««t«S0R* 
P**VSAHV** 
AASiNSRVAS 
AASFSQTVSA 
5*TLNS«V«L 



laPEENOVF 
••*••••••« 

Vt«tJUU*«AL 
«»»«SKC0T5 
»»««SUH}RN 
VT«tPAL«Ti 
»S»«TSCA— 
Vtt«SASO0» 
GV««BAGTU 



PPCfilCSIAC 
?•••««•«•• 

VD»X*frVTt 
TITTKtWT* 
VTITD»WT* 
tT*TPtWTt 

SA«TO«LtS< 
AA«TT«Vnt 
AKHSP*YtT« 



«GKVI-rOC-SP 
••AL«-***-*T 

•tFSIC-EM-E! 
••YRKHRO-KI 
• «0*HSCTPLP* 
•«»ITUS«T-»Y 
••LTRYTHA-irT 
••LTf-THG-OL 



AOILQEAOVP 
•«V«**«*«t 
«GV«t«<Ra* 
aH««*KVN]t 
ONT**KtKlt 
PFP«KOVR<* 
P«V«KCLM* 
P«Rv«0*SL* 

«aT*«o*rL« 



960 

LLSKEKC&QQ H- PEYNI-TENM MCAGYEEGGl 

•••«•«>«•■■ •..-.*.-«a»«t-««a« v««*«*A**V 

II««0Y«HGAO F TCKO«-KPIl» Ft»»«pt»tt 

•YT»»E»«-KR Y a0«Ki-*Oa* V««»tK»»tR 

•VT»»E«»-KR Y RGHKa-«H)lt |«t*tff«*aK 

IVE«SNsO-V> YKLGLSTG0CVR«VRt0« l««»--N5KS 

I«»«SS«K-SA Y" PQO*-bS«* f»»»»L»«»R 

•tatTN«K— K Y¥- -<JTK«-KW» I*»*--AS*V 

TVOYAltSSSS YW CSTV-KNS* V" 



Enteropeptidase (porcine) 
Enteropeptidase (bovine) 
Hepsin (hinan) 
Plasma Kallikrein (husan) 
Factor XI (hunan) 
Tryptase (dog) 
Trypsin (bovine) 
Chymotrypsin (bovine) 
Elastase (porcine) 



080 

•A«*t»Q»*« 
tA*K*«3**« 
•A«K««3i>* 
«t«*«*Qi»« 



1000 

LHCLEN NRVL LACVTSFGTQ 

•«<0** •••• •••t«*«*t* 

FV«EDS1SRTP>«R •C*lVttf*TG 

•V«KH* GK»R •V»I««WsEG 

•S»KH» EVtH •V«lttir«EG 

•V»RVR CV»» (}*«tVtVt£G 

W<SGK «0«IV*V*SG 

•VtXK* GA»T tVtlV*V«SS 

«H«»V« GOYA VH*»«««VS8 



1020 

— CAIPHRPCVY ARVPRFTEW 

— •#«•••••»• •*««K«t«*« 

— *««AOX***» TK«SO*Rtt« 

— •«RREO*«»» TK«AEY»»» 

— ••Q8E**««» TII»VEWO»* 

— t«0«t«»«lt T«tAYYLD*« 

— ••<K«»»»t TK*cinrvs«« 

-T«STST-»t«t •««TALVN«V 

LG«NVTRK«T«F T«*$AYI$t« 



OSFLH 
»«•«• 

FQAIKTHSEASCMnOl 

LEKTOSS0GXACH3SPA 

LEKTQAV 

HOYVPKEP 

KOTIASN 

«OT*AAN 

NNVIASN 



(Identity, X) 

(89.8) 
(44.7) 
(40.4) 
(39.1) 
(39.1) 
{3S.3) 
(34.1) 
(30.6) 



Fio. 4. Comparison of the amino acid sequence of the cataljrtic chain of enteropeptidase with those of other serine proteinases. 
The catalytic chain sequence of porcine enteropeptidase is compared with those of bovine enteropeptidase (12), human hepsin (21), human plasma 
kallikrein (22), human factor XIa (45). dog tryptase (46), bovine trypsin (47). bovine chymotrypsin (48-51). and porcine elastase (52). Residues are 
expressed in one-letter code. indicates the same residue with porcine enteropeptidase; indicates deletion inserted to optimize the homology. 
Residues in white letters are the conserved catalytic triad. His, Asp. and Ser. The percentages of identity with porcine enteropeptidase are listed 
at the ends of the sequences. 



mentary to the coding chain: 5'-ATICCITGIATIA(A/G)ITCIT- 
CITnATITTITCITCI(C/G)(T/A)IACCCAITGIGCIAA-3'. First, 
we screened the random-pnmed cDNA library using the oligo- 
nucleotide probe. Of about 5 x 10^ independent clones, two 
positive clones (EKRrl and -2) were isolated. Next, using the 
insert DNA of EKR-1 as a probe, we screened the oligo(dT>- 
primed cDNA library, whose cDNA was size* fraction a ted to be 
larger than approximately 1,6 kilobase pairs. Of 5 x 10' inde- 
pendent clones, 11 clones giving positive signals were isolated. 
7 of which were later found to be fused with other cDNAs for 
unknown reasons and were excluded. The resxilts of restriction 
enzyme mapping and DNA sequencing of both ends of the re- 
maining four clones named EK-2, -3. -7. and -11 and EKR-1 and 
-2 are presented in Fig. 2. The six clones had essentially the 
same restriction enzyme map except for an EcoBJ site in EK-7. 
EK-2 was judged to be the longest clone and was used for 
Airther sequencing. 

Nucleotide and Deduced Amino Acid Sequences of cDNA 
Clone EK-2 — The nucleotide and the deduced amino acid se- 
quences of EK-2 are shown La Fhg. 3. The cDNA clone was 3597 
base pairs long. It had a polyadenylation signal at the 3559 
base pair position and poly<A) at the 3 '-end. The first ATG met 
the criteria for an initiator codon in eukaryotes (20). Assuming 
this codon to be the initiator, the open reading frame was 3102 
base pairs long, and thus the deduced amino acid sequence was 
composed of 1034 residues. The boxed sequence from positions 
19 to 43 was the most hydrophobic domain in the sequence. The 
NHj-terminal sequences of the M, H. and L chains were de- 
duced to start at positions 52. 118. and 800. respectively. Thus, 
the enzyme is thought to be originally synthesized as a single- 
chain precursor (Af, s 114.763). Assuming that no more proc- 
essing occurs in the COOH-terminaJ region of each chain, the 



M. H. and L chains contain 66. 682. and 235 amino acid resi- 
dues, respectively. The molecular weight of each chain calcu- 
lated from the deduced amino acid sequence was much smaller 
than that determined by SDS-PAGE (Table II), probably due to 
the presence of oligosaccharide chain (s). 

A homology search for the deduced amino acid sequence by 
the FASTA program in the PIR protein data base revealed that 
the catalytic (L> chain is homologous with those of trypsin- and 
chymotrypsin-like serine proteinases (Fig. 4). Human hepsin 
(21) and plasma kallikrein (22) showed over 40% identity. The 
bovine enzyme (12) was 89.8% identical with the porcine en- 
zyme. On the other hand, the H chain had interesting homolo- 
gies in limited regions of certain proteins. The sequences at 
positions 195-236 and 654-692. homologous with each other, 
were homologous with those in complement C9 (23), low den- 
sity lipoprotein (LDL) receptor (24), etc. (Fig. 5a). The se- 
quences at positions 240-353 and 539—653 are also homologous 
with each other and were homologous with those in dorsal - 
ventral patterning protein (25), complements Clr (26) and Cls 
(27), etc. (Fig. 56). The sequence at positions 772-788 was 
homologous with those in factor X (28). protein C (29), hepsin 
(21). etc. (Fig. 5c). 

Threc'dimensional Structure of L Chain of Porcine Entero* 
■peptidase as Deduced by Computer Aiodelinff^Three-dimen- 
sional structural modeling of the complex of the catalytic chain 
and the NH^ terminus of bovine trypsinogen, Val'-Asp-Asp- 
Asp-Asp-Lys®, was performed tising the chimeric reference pro- 
tein, which was 38.79b identical with the L chain with a 2-rest- 
due insertion in the fourth segment: The resulting model' is 
shown in Fig. 6a. The mode of binding of the NH,-terminal 

* The coordinate data of the model may be presented on request. 



19980 



Structure of Porcine Enteropeptidase 



C9/LDL- receptor type region 

Enteropeptidase (195-236) 

(654-692) 

Consensus sequences 
LOL-receptor 

Terminal cofflpleoent coopononts 
LDL-receptor related protein 
Perl ecan 
GP-330 



lAVOlF 
LLVDL 





EDSRI 
£ AH-- 



c 



--C 



EDO-- 



ft> Clr/s type region 

Enteropeptidase ( 



Consensus sequence 
(Clr/s. DVPP, BMP-1) 



(240-353) BoGKf LiQeSS@SF-OAAqFpXL-SEASVV 
( 539-65 3) CqCPFElWEPNTTF-TSHHRPHH-fyflHOAF 

g. . .g. I .s-p-£p • • -|y3 



L 
V 



qMiQJvlM J3l s I ^'J'Rs Y - - 3mtt sm 

c(\|mNL» AQKGKNIC LHFEE-- FOLElMI A 



c -MQia 




(Rye 



3vGSSKIlRASLtLM---HPGTf3RIFSMOVTVTFL!E 
^ a '="VEOVFSTljiaRgTQlJF|l 

s 00 



RqclEEOOSLLLA-VY 

I«'C»«»---***G** 




T 

S 



SDIEHOYL-- 
OALTKG- 

S 
T 



cRtdAhYTAFHSTE 
GFI^ANFTTGYHLG 



L 



C Carboxyl-terminal region of the noa— catalytic chain 



Enteropeptidase 
Heps i n 
Factor X 
Protein C 



<772-7ea) 
(140-154) 
(1 1 1-133) 
(120-142) 



C FEOSLILLd 

CPRGRFLAAl \ 

cargytladngkaI 
capgyklgooilq 



MHKS 

QO 

IPTGPYP 

tHPAVUFP 



k:g 

CG 
CG 
CG 



Fic. 5. Comparison of partial sequences of the H chain with those of homologous regions in other proteins, o, the cysteine-rich 
sequence repeats are compared with the consensus sequences of human LDL receptor (24); human terminal complement components C7 (53), C8a 
(54), cap (55), and C9 (23); human LDL receptor-related protein (56); human perlecan (57); and rat GP-330 (58). The residues identical in at least 
six sequences are boxed. 6, Clr/s type sequences are compared with the consensus sequence (25) among the sequences of human complenient 
components Clr (26) and Cls (27)» Drosophila dorsal-ventral patterning protein (OVPP (25)). and bone morphoffenetic protein-l iBMP-t (59)). The 
residues identical between the enteropeptidase sequences and the consensus sequence are boxed, c, the sequence near the carboxyl-terminal end 
of the H chain is compared with those of the corresponding regions of human hepsin (21), human factor X (42), and human protein C (29). The 
residues identical in the four sequences are boxed. In o, 6, and the values in parentheses indicate residue numbers; a deletion mserted to 
optimize the homology; a non-consensus residue. 



hexapeptide of bovine trypsinogen with the active site region of 
the catalytic chain is also shown (Fig. 66). 

DISCUSSION 

The mature three-chain enzyme is thought to be generated 
by peptide bond cleavages from the single-chain precursor in 
which the three chains are aligned in the order M, H, and L 
chains, starting from the NHg terminus. Previously, the porcine 
enzyme was reported to be composed of two chains, an H chain 
(Af, = 134.000) and an L chain (Af^ = 62,000) (4). On the other 
hand, the human enzyme was reported to be a three-chain 
enzyme (10). Two of the human chains have molecular weights • . 
of 140,000 and 54,000, comparable with those of the H and L 
chains of the porcine enzyme, respectively, but the third 
polypeptide (M, s 120,000) of the human enzyme is much larger 
than the porcine M chain (Af, o 16,000-19,000). Thus, the M 
chain appears to be a newly identified component of the en- 
zyme, although it is not clear at present whether the M chain 
is essential for the function of enteropeptidase. 

The predicted amino acid sequence of the porcine enteropep- 
tidase precursor contained a 51-residue peptide sequence, 
which is missing in the purified mature enzyme. This peptide 
contains a very hydrophobic segment (from Val^° to Ue*^) long 
enough to span the membranes. Since the precursor protein 
does not appear to have any other membrane-spanning seg- 
ment or typical signal sequence, this hydrophobic segment pre- 
siunably serves as an internal signal sequence (30-32) and 
keeps Uie enzyme bound to membranes. Enteropeptidase is 
localized to the brush border membranes of the duodenum and 
upper intestine (33, 34) in such a manner that its catalytic 



domain can freely contact extracellular trypsinogen. Therefore, 
the NHj-terminal region should reside on the cytoplasmic side 
and the COOH-terminal region on the outside of the cell. Thus, 
enteropeptidase is apparently a TVpe II* integral membrane 
protein. The NHa-terminal positively charged residue<s) flank- 
ing the internal signal sequence is known to be an important 
part of a dominantly acting retention signal to create the TVpe 
II orientation (35). The NHj-terminal Sl-residue peptide ap- 
parently meets the above structural requirements. 

As schematically shown in Fig. 7, the purified porcine en- 
zyme obviously resulted from proteolytic cleavages at three 
sites. Cleavage at Ala^*-Leu" produces the enzyme dissociated 
from the membranes. Inter^tingly, Tbyoda et al. (36) reported 
that elastase could release enteropeptidase activity from the 
brush border membranes. The peptide bond cleavage at Ala**- 
Leu" is compatible with the substrate specificity of elastase. 
Therefore, elastase may be responsible for the cleavage. In 
addition, other proteinases cleaving Gly**"-Ser»®® and Lys^"- 
Qe^ must be present, although no information about them is 
available at present. 

The H chain has a Ser/rhr*rich sequence at positions 172- 
187, comprising 12 residues of Ser/Thr. Such Ser/Phr-rich re- 
gions, which have been foimd in glycophorin A (37), LDL recep- 
tor (38), sucrase-isomaltase (39), aminopeptidase N (40), ete.. 
are documented to be potential O-linked glycosylation sites. 
Indeed, polyclonal antibodies against human enteropeptidase 
were reported to cross-react with type A blood antigen (10), 
indicating the presence of O-linked oligosaccharide(s) in the 



* The nomenclature is according to von Heune and Gavel (61).* 




Structure of Porcine Enteropeptidase 



h I ) y -V 



K8SS » 




Fic. 6. The three-aimensional structure of the L chain of por- 
cine enteropeptidase constructed by computer modeling, o, the 

tube model of the main chain. Segments in the reference chimera pro- 
tern derived from 3RP2. ITLD. IDWB. 4CHA. ISGT, IHNE. and 3EST 
^^T^^It^ '"^J^^lSyen, yellow, blue, magenta, cyan, and white, respec- 
tively. The side chains in the basic amino acid cluster, Arg»"-Arg-Ar«- 
Lys . are shown with the Corey.Pauling-KoItun models colored in 
yellow^ and those of the active site His»« and Ser»« in cyan and green 
respectively. The ribbon model colored in red shows the main chain of 
part of the substrate, VaJ-Asp-Asp-Asp-Asp-Lys. b, the stick model of 
the enzyme interactmg with part of the substrate, Val-Asp-Asp-Aso- 
Asp-Lys. -nie substrate part is shown with the red stick model. The side 
chains of the catalytic triad of Asp"'. Hifi»«, and Ser*" of the enzyme 
are shown with the yellow stick model, and those of the amino acid 
residues of the en^rmeinteracUng with the substrate are shown with 
i^^i''"!,?^,"*^**^! TTie calculated distances for the two hydrogen 
?^Tk,2*lTC.^P ^ His•«N--Ser-«0^ and the io^c pS^ 

Arg"^'»-GIu*«0«», are 2.74, 3.08, and 2.65 A, respectively. Those fo^ 
IT l?22J'^"J^f.^®®U?® enzyme and the substrate trypsinogen 
(Arg^^N-^-AspW. Arg"«N^».Asp»0*», and Iors«»N«-Asp»0^and Se 
SllSSS? ^.'"yS?, enzyme and substrate main chain atoms 

(Ty^odiN.Asp«0. Gly'^'T^.Asp^O and GIy«*N-l^-0) are 2.66, 2.76, and 
2.51 A and 2.76. 2.77. and 2.69 A. respectively. 

enzyme. l*hus, the Ser/Thr-rich segment in the H r-h^ir. is pre-, 
aumably the region of O-linked carbohydrate attachment. In 
addition, 22 potential ^-linked glycosylation sites are seen in 
the enzyme, in accord with the previous findings that the en- 
zyme is heavUy glycosylated (4, 6, 7). From the present study. 



19981 

the carbohydrate content of porcine enteropeptidase is esti- 
mated to be as much as 50% of the total weight. 

Two sets of repeating sequences are present in the H chain 
We found two tandem repeats of 38 amino acids (about 30% 
idenUty) including 6 conserved cysteine residues (Fig 5o) Al- 
though the locations of the disulfide bonds in enteropeptidase 
have not been determined, these 6 cysteine residues are likely 
to form three intrachain disulfide bonds within each of the two 
repeats. They are homologous vrith certain regions in some 
terminal complement components such as C9 (23), LDL recep- 
tor (24), etc. The homologous seven repeating sequences in LDL 
receptor are thought to be the sites for interaction with apoli- 
poproteins (38). Besides, polymeric complement C9 has re- 
cently been reported to have affinity with apolipoproteins (41). 
By analogy, the cysteine-containing repeats in enteropeptidase 
may also be the sites of interaction with other proteins such as 
apolipoproteins. As shown in Fig. 66, the H chiin contains 
another two segments with internal homology (about 25% iden- 
tity), resembling partial sequences of complement components 
Clr(26) and Cls (27), etc. At present, the role of this Clr/s-type 
region in the enteropeptidase H chain is not known. In addi- 
tion, a region near the COOH-terminal end of the H chain 
shows low but detectable sequence homology with the corre- 
sponding regions of the non-catalytic chains of some other ser- 
ine proteinases (Fig. 5c). In protein C (29) and factor X (42). 
proteolytic cleavages in the activation process are known to 
occur at mono- or dibasic sites between these regions and the 
NHj termini of the catalytic chains. By analogy, the enteropep- 
tidase precursor may be cleaved at the dibasic site Lys'^^-Lys^^ 
at first and then activated by the cleavage at the NH^ terminus 
of the L chain. 

On the other hand, the L chain is highly homologous with the 
catalytic chains of other serine proteinases (Fig. 4). The three- 
dimensional structural model of the L chain indicates that the 
catalytic triad, His^o. Asp'"'*, and Ser««. and the S^* pocket are 
situated essentially in the same manner as in trypsin (43) 
Moreover, in the S, pocket, Asp^" positioned at its bottom and 
Gly and Gly*'**' at its neck are also conserved in enteropep- 
tidase, indicating that it is a typical trypsin-like serine protein- 
ase. Since enteropeptidase has a strict specificity toward sub- 
strates with acidic amino acid residues at the P2-P5 sites, the 
presence of additional sites iS^S^) for substrate side chain 
binding has been postulated (3, 8). Lysine residue(s) has been 
suggested to be important to the substrate specificity of porcine 
enteropeptidase by a chemical modification study (44), Accord- 
ing to the present structural model of the porcine L chain in-, 
eluding the NHa-terminal hexapeptide (Val'-Asp-Asp-Asp-Asp- 
Lys®) of bovine trypsinogen (Fig. 66). the basic cluster sequence, 
Arg«**-Arg-Arg-Lys«". unique to enteropeptidase among the 
family of serine proteinases (Fig. 4). appears to make a turn 
structure ac^acent to the S, pocket and interact with Asp^-Asp- 
Asp-Asp* of trypsinogen through three strong salt bridges- 
Arg*o» versus Asp^ Arg^^ versus Asp», and Lys*" versus Asp^ 
This is consistent with the previous results indicating that an 
acidic amino acid at the site in the substrate is essential and 
that those at the Pa-Pa sites are beneficial for the cleavage (3, 
8). In the bovine L chain, the residue corresponding to Arg^ is 
substituted with Lys (12), but the substitution does not seem to 
cause any significant effect on the interaction with the sub- 
strates. Moreover. Arg«" makes an ion pair with Glu^^a. The 
carboxyl group of Asp^ of the peptide does not interact with the 
enzyme in this model but may form an ion pair with the side 
chain of Lys"* of bovine trypsinogen as judged fi^m a three- 
dimensional structure model (data not shown). Further, the 
main chain atoms. Asp*0. Asp^O, and Lys^O of the peptide 




19982 



? ' 
- 1 . 



Structure of Porcine Enteropeptidase 



Hoavy chain 



I r- 

• * 



f 
i 

i 

j 
I 



t ii*'..-.-! I- 1 



• * 



I s s 


1 i II 


S/T 




C 9-a 


II 1 1 1 1 





C y t o s o ) 




C I - a 



Lumen 



Brush border / 
membranes ^ 



Mini" chain 



C9-b 



C 1 - b 



■S — S 1 

^ (Light chain 



CNCC : A P 
• (?) 



I 

I 

L 



i i 



Cat 



-COOH 



I 

I 

..I 



Pic. 7. The gross structure of the precursor form of porcine enteropeptidase, the sites of proteolytic processingp and potential 
asp aragine- linked glycosylation sites. /S5» putative internal signal sequence; S/T, Ser/Thr-rich sequence; C9-a and -6, repeating sequences 
homologous with part of the sequences of complement C9/LDL receptor; Ci>o and -6, repeating sequences homologous with part of the sequences 
ofcomplement Clr/s; CNCC, sequence near the COOH-terminal region of the H chain homologous with those of the noncatalytic chains of two-chain 
serine proteinases such as factor X and protein C; AP. putative activation peptide; Cat^ catalytic domain. Closed circles indicate potential 
asparagine-Hnked glycosylation sites. Vertical arrows indicate proteolytic processing sites. 



substrate form three hydrogen bonds with the atoms, Tyr*°**^N, 
GIy**"N, and Gly®**N of the enzyme, respectively. Thus, the 
unique substrate speciHcity of enteropeptidase can be ex- 
plained clearly. 

Acknowledgments — We thank Dn S. B. P. Athauda, Dr. Y. Tkmanoue, 
and Y. Tsuchiya for valuable discuission of this work; Y. Sakurai for 
NHg-terminal amino acid sequence analysis and kind advice on meas* 
urement of the enzyme activity; and Dr. H. Komooka and Dr. K. 
Kamiya for the computer programs used in deducing the three-dimen- 
sional structure of the catalytic chain. 

REFERENCES 

1. Light. A., and Janska, H. (1989) Thtnda Biochtm, ScL 14, 110-112 

2. Ghiahan, P. IC, Le«. P. C, Lebenthal. B., Johnson. P., Bradley, C. A., and 

Greene. H. U (1983) Gastroenterology 89. 727-731 

3. Maroux. S.. Baratti, J., and Desnuelle, P. ( 197 1 > J. Bioi. Chem. 246, 5031-5039 

4. Baratti. J., Maroux, S., Louvord, O.. and Desnuelle, P. ( 1973) Biochim. Bio- 

phys. Acta 315. 147-161 

5. Crant, D. A. W.. and Hermon-Tbylor, J. (1975) Biochenu J. 147, 363-^366 

6. Grant, D. A. W., and Hermon-Tbylor, J. (1976) Biochem. J. 156* 24d-254 

7. LiepniekA. J. J., and Ught. A. (1979) J. Biol. Chem. 254, 1677-1683 

8. Ught, A.. Sflvithri. H. S.. and Uepnielu. J. J. (1980) AnaL Biochem. 108, 

199—206 

9. Fonseca. P.. and Ught, A. (1983) J. Biol. Chem. 258, 14516-14520 

10, Magee. A. I., Grant. D. A. W.. and Hermon-TBylor. J. (1981) Clin. Chim. Acta 
115,241-254 

\U Ught, A., and Fonseca, P. (1984) J. Biol. Chem. 259^ 13195-13198 

12. LaVaUie, B. R. RehenituJla, A.. Rade. L. A.. OtBlasto, E. A., Ferenx, C, Grant, 

K. U. Ught. A., and McCoy. J. M. 11993) BioL Chem. 268, 23311-23317 

13. Bradford. M. M. ( 1976) Anail. Biochem. 72, 248-254 

14. Laemmli. U. K. {1970} Natun 227, 680-685 . 

15. LeGendre. N.. and Motsudaira. P. (1989) in A Practical Guide to Protein and 

Peptide Purification for Mierosequencinff (Matsudaira, P., ed.). pp. 5Z-72, 
Academic Press Inc.. San Diego. CA 

16. UUrlch. A.. Shine. J.. Chirgwin. J.. Ptctet. R.. Tischer, E.. Rutter. W. J., and 

Goodman. H. M. 11977) Science 100, 1313-1319 

17. Gubler. U.. and Hoffman. B. J. (1983) Ceng (Amst.) 25, 263-269 

18. Sanger. P.. .VickJen. 3.. and Coulson. A. R. (1977) Proc J^atL Acad. ScL U. S. A. 

74. 5463-5467 

19. l&vjihara. A.. Komooka. H.. Koznlya. K., and Umeyama. H. (1993) Protmin Eng. 

615—^20 

20. Kozak, M. < 1984) Sucleic Acids Ree. 12, 857-872 

21. Leytua. S. P.. Loeb, K. R. Hagen. F. S.. Kurachi. K.. and Davie. EL W. (1988) 

BiaehemUtry 27. 1067-1074 

22. CHiung. D. W.. Pujisawa. K.. McMuUen. B. A., and Davie, E. W. (1986) Bio- 

chemUtry 25, 2410-2417 

23. DiScipio. R. C.. (;;ehring, M. R.. Pbdock, £. R. Kan. C. C. HugU. T E., and Fey. 

G. H. (1984) Proe. NaiL Acad, ScL V. S. A. 81, 7298-7302 

24. SQdhof. T. C. Goldstein. J. L.. Brown. M. S.. and RuaaeU, D. W. (1985) Science 

228. 815-^22 

25. Shimell. M. J., Ferguson. E. U. Childs. S. R. and O'Connor, M. B. (1991) CeU 

67, 469-481 

26. Joiimet. A., and Ibsi, M. (1986) Biochem. J. 24a 783-787 

27. NUckinnon. C. M.. Carter. P. E.. Smyth. S. J., Dunbar. B.. and Fothergill. J. E. 



(1987) Eur. J. Biochem. 169, 547^55 

28. McMullen. B. A., Fiuisawa. K.. Kisiel. W.. Sasagawa. T.. Howald, W. N.. Kwa. 

B. Y.. and Weinstein. B. ( 1983) Bioehemiitry 22, 2875-2884 

29. FosUr, D.. and Da vie. E. W. (1984) Prxx. NatL Acad. ScL U. S. A. 81, 4766- 

4770 

30. Bos, T. J.. Davis, A. R., and Nayak. D. P. ( 1984) Proc. NatL Acad. ScL U. S. A. 

8 1 2327—233 1 

31. Spie8*8, M.. and Lodish. H. F. (1986) CeU 44, 177-165 

32. Schmid. S. R., and Spiesa, M. (1988) J. Bioi. Chem. 263, 16886-16891 

33. Hcrmon-lbylor. J.. Pernn. J.. Grant, D. A. W., Appleyard. A., Bubel. M.. and 

Magee. A. I. (1977) Gut 18, 259-265 

34. Lojda. Z.. and Gossrau. R. (1983) //M/ocAemf5//y 78. 251-270 

35. Hortmann, E.. Rapoport. T. A., and Lodish, H. F. ( 1969) Proc. Natl. Acad. ScL 

U.S.A. BO, 5786-5790 

36. Ibyoda. S.. Lee. P. C, and Lebenthal. E. (1985)Di^. Die. ScL 30, 1174-1180 

37. Ibmita. M.. Furthmayr, H.. and Marchesi, V. T. ( 1978) Biochemistry 17, 4756- 

4770 

38. Soutar. A. K., and Knight, B. U (1990) Br. Med. BuU. 46, 891-916 

39. Hunsiker. W., Spieaa, M., Semenza, G., and Lodish. H. F. (1986) CeU 46. 

227-234 

40. Watt, V. M.. and Yip. C. C. ( 1989) J. BioL Chem. 264. 5480-5487 

41. Hamilton. K. K., Zhao. J., and Sims, P. J. ( 1993) J. Biol. Chem, 268. 3632-3638 

42. Leytus. S. P., Chung, D. W., Kissel. W., Kurachi. K.. and Davie. B. W. (1984) 

Proc. NaiL Acad. ScL U. S. A. 81, 3699-3702 

43. Stroud, R M.. Kay, U M., and Dickerson, R. E. (1974)*/. Afol. Bioi. 83. 185-208 

44. Baratti, J., and Maroux, S. (1976) Biochim. Biophya. Acta 452. 468-496 

45. F^ikawa, K., Chung, D. W., Hendrickson. L. E.. and Davie, E. W. (1986) 

Biochemistry 25, 2417-2424 

46. VanderBUce, P.. Croik, C. S.. NadeU J. A., and Caughey, G. H. (1989) Biochem- 

Uify 28, 4148-^155 

47. Miked; O., Holeyfiovsl^, TbmdAek. V;. and Sorm. F. ( 1966) Biochem. Biophys. 

Res, Commun. 24, 346-^2 

48. Hartley, B. S. (1964) Nature 201, 1284-1287 

49. Meloun. B., Kluh. X.. Kostka, V., Mo/avek. L.. Pnlaik. Z.. Vanteek, J., Keil, B., 

and Sorm. P. (1966) Biochim. Biophya. Acta 130, 54^-546 

50. Hartley. B. S., and Kaufimann. D. L. (1966) Biochem. J. 101, 229-231 

51. Blow. D. M.. Birktoft. J. J., and HarUey, D. S. (1969) Nature 221, 337-340 

52. Kawashima. I., Taxu. T.. Shimoda. K.. and TbUguchS. Y. (1987) DNA (N.Y.) 6. 

163-172 

53. DiScipio. R. O.. Chokravorta, D. N.. MuUer-Eberbard. H. J., and Fey, G. H. 

(1988) J. BioL Chem. 283, 549^560 

54. Rao, A. G.. Howard, O. M. Z., Ng. S. C. Whitehead, A. 3.. Colten, H. R. and 

Sodex. J. M. (1987) Biochemistry 26, 3556-3564 

55. Howard, O. M. Z., Rao, A. O.. and Sodex. J. M. (1987) Biochemistry 28. 3565- 

3570 

56. Herx, J., Hamnnn, U., RogneT S.. Myklebost, O.. Gausepohl. H., and Stanley. K. 

K (1988) BMBO J. 7, 4119--4127 

57. Murdoch. A. D.. Dodge, C. R., Cohen. L. IXian. R. S., and louo. R. V. (1992) J. 

Bid, Chem. 267, 8544-^6557 

58. Raychowdbury, R., Ntles, J. U, McCluskey. R T., and Smith. J. A. (1989) 

Science 244, 1163-1165 

59. Wozney. J. M., Rosen. V.. Celeste. A. J., Mitsock. L. M.. Whitterv, M. J.. Kritx. 

R W., Hewick. R. M., and Wang, E. A. (1988) Sctanc* 242. 1528-1534 

60. Beigcr, A., and Schechter, I. ( 1970) PhOos. TVonx. it 5oc. Land. B 287, 24^264 

61. von Heiine. G., and Gavel. Y. (1988) Eur. J. Biochem. 174, 671-678 

62. Komooka. H., and Umeyama, H. (1991) Abstracts of the I4th Symposium on 

Chemieai Infitrmation and Computer Sdenoe, Kawaguchi, pp. 71-73. 
Chemical Society of Japan. Tb^yo 



Exhibit 23 



Perspectives in 
Bioconjugate Chemistry 



Edited by 

Claude F. Meares 

University of California 




American Chemical Society, Washington, DC 1993 






Ubrary of Congress Cataloging-in-Publication Data 

Perspectives in bioconjugate chemistry / edited by Claude F. Meares. 
p. cm. 

Contains -a collection of articles previously published in the journal: 
Bioconjugate chemistry. 

Includes bibliographical references and Index. 

ISBN 0-8412-2672-5 

1. Bioconjugates. 

I, Meares, Qaude R» 1946- . II. American Chemical Society. 
QP517.B49P47 1993 

574. 19'2— dc20 93-15385 



The paper used in this publication meets the minimurn requirements of American 
National Standard for Information Sciences — Permanence of Paper for Printed - 
Library Materials, ANSI Z39.48-1984. ^ ' 



Copyright © 1993 
American Chemical Society 

All Rights Reserved. The appearance of the code at the bottom of the first page 
of each chapter in this volume indicates the copyright owner's consent thai 
reprographic copies of the chapter may be made for personal or internal use or 
for the personal or internal use of specific clients. This consent is given on the 
condition, however, that the copier pay the stated per-copy fee through the 
Copyright Clearance Center, Inc., 27 Congress Street. Salem, MA 01970, for 
copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. 
This consent does not extend to copying or transmission by any means — graphic 
or electronic— for any other purpose, such as for general distribution, for 
advertising or promotional purposes, for creating a new collective work, for 
resale, or for information storage and retrieval systems. The copying fee for each 
chapter is indicated in the code at the bottom of the first page of the chapter. 

The citation of trade names and/or names of manufacturers in this publication is 
not to be construed as an endorsement or as approval by ACS of the commercial 
products or services referenced herein; nor should the mere reference herein to 
any drawing, specification, chemical process, or other data be regarded as a 
license or as a conveyance of any right or permission to the holder, reader, or any 
other person or corporation, to manufacture, reproduce, use, or sell any patented 
invention or copyrighted work that may in any way be related thereto. Registered 
names, trademarks, etc., used in this publication, even without specific indication 
thereof, are not to be considered unprotected by law. 



CIP 




PRINTED IN THE UNITED STATES OF AMERICA 



Chapter 2 



Chemical Modifications of Proteins: 
History and Applications 

Gary E. Meansf and Robert E. Feeneyt 

Dq)artment of Biochemistry,. The Ohio State University,. Columbus, OH 43210, and Department 
of Food Science and Technology, University of CaUfomia, Davis, CA 95616 

Reprinted from Bioconjugate Chemistry, Vol, 1, No. 1, January/February, 1990 



With roots in ancient formulations, methods for the 
chemical derivatization of proteins continue to expand 
and develop. The creation of this new journal dealing 
exclusively with bioconjugate chemistry was barely con- 
ceivable just a few years ago. An explosion of interest 
in the subject during the last decade is, however, easily 
seen. The tremendous growth in both the number of pub- 
lications and in the number of research groups involved 
in these kinds of studies has been promoted by both prac- 
tical interests related, for example, in some cases to pos- 
sible pharmacological or medical diagnostic applications 
and by interest in questions of fundamental biochemi- 
cal structure and function. 

Greatly improved understanding of established reagents 
and procedures and the development of many new, and 
more sophisticated, reagents and procedures have been 
facilitated by advances in the ancillary fields of organic 
chemistry. X-ray crystallography, and molecular biol- 
ogy. Whereas protein modification in the past often 
involved the same reagents and reactions commonly used 
in the organic chemistry of that time (i.e., acetylation, 
iodination, deamination, reaction with formaldehyde, etc.), 
those in most common use today have, by and large, been 
developed to meet the varied but relatively specific needs 
of the protein chemist. A large number of specialized 
reagents have been described: affinity labels, photoaffin- 
ity labels and other specifically designed site-directed 
reagents (i, 2), group-selective reagents which react exclu- 
sively (or at least predominantly) with one particular type 
of amino acid side chain (see below, especially Table II), 
and others that^react relatively nonspecifically with a num- 
ber of different side chains (5). 

Reagents have been designed to preserve electrostatic 
charge (4, 5), to alter electrostatic charge (6), and to increase 
hydrophobicity (7, S). Reagents and procedures have been 
developed to decrease immunogenicity (S, 70), to increase 
and decrease susceptibility to proteolysis (11-13), to 
increase UV or visible absorbancy (14), to introduce flu- 



The Ohio State University. 
University of California. 



orescent labels (i5, 16), spin labels (17), radiolabels (18- 
20), various metal ions (21), magnetic microspheres (22, 
23), and electron-dense substituents (24), to increase the 
content of certain low-abundance nonradioactive iso- 
topes (25), and to attach several different types of car- 
bohydrate moieties (26-29), biotin (50), and a number of 
other biospecific recognition groups (i.e., avidin, strepta- 
vidin, antibodies, protein A, protein G, lectins, and oth- 
ers (31)). Procedures also have been developed to effect 
the cleavage of peptide chains (32, 33); to modify enzyme 
specificity (34); to modify the terminal hydroxyls of galac- 
tosyl residues in glycoproteins (35); to introduce intramo- 
lecular and intermolecular cross-links, both to couple 
already associated species (36, 37); and to join various 
proteins, which might or might not otherwise associate, 
in order to combine the properties of both into a single 
molecule, e.g., to make protein-protein conjugates (38, 
39), enzyme-linked antibodies (40, 41), immunotoxins (42, 
43), .Bind drug-protein conjugates (44), A large number 
of reagents that have been developed to serve these and 
a variety of other purposes are commercially available. 

EARLY DEVELOPMENTS 

The chemistry of proteins had its origin in the chem- 
istry of the amino acids and only later concerned the amino 
acid side chains of intact proteins. For practical pur- 
poses, a variety of procedures for protein modification 
had been developed and used many years, prior to any 
significant interest in or understanding of protein chem- 
istry; For example, the use of formaldehyde and other 
agents in the tanning industry was apparently formu- 
lated en tirely on the basis of empirical observations, with- 
out any real understanding of the reactions or of the chem- 
ical nature of the materials involved. Similar proce- 
dures were also employed successfully to convert a number 
of protein toxins, usually of bacterial origin, into tox- 
oids, which retain some of the original antigenic deter- 
minants but are no longer toxic. Inoculations of toxoids 
are still widely employed to confer immunity against a 
number of serious bacterial diseases. Although still widely 



2672-5/?3/0010$06.00/0 
® 1990 American Chemical Society 



2. MEANS AND FEENEY Cbemicai^^Hfications of Proteins 




used, there is not much known about the manner by which 
formaldehyde converts toxins into toxoids. 

Interest in quantitative determinations of proteins and 
their various constituent amino acids was a major impe- 
tus for many early studies of chemical modification. While 
a significant number of proteins had been crystallized 
by the 1920s, analytical values for individual amino acids 
were still quite poor well into the 1940s. Analytical data 
had, for example, revealed only one sulfur- containing amino 
acid, cystine, in naturally occurring proteins prior to the 
discovery of methionine in 1922. Threonine was not dis- 
covered until 3 years later. 

Most of the procedures available at that time for the 
determination of individual amino acids were, of course, 
supplanted by the development of the far more conve- 
nient cation- exchanger amino acid analyzer in the 1950s. 
Slightly altered forms of some of those procedures, how- 
ever, still find use today. Variations of the Van Slyke 
procedure for determining protein nitrogen,' for exam- 
ple, are still sometimes useful for bringing about the selec- 
tive deamination of proteins. Sodium nitroprusside, which 
was once used for spectrophotometric determinations of 
cysteine, also appears to be useful for the selective mod- 
ification of protein thiol groups. Some much more recently 
developed procedures for protein modification, on the other 
hand, have been shown to be useful for analytical deter- 
minations of certain amino acids in proteins. The use of 
water-soluble carbodiimides and certain nucleophiles to 
determine amounts of glutamine and asparagine, and of 
2-hydroxy-5-nitrobenzyl bromide to determine tryp- 
tophan contents of proteins are possibly of special inter- 
est since the acid lability of those amino acids makes 
their determinations difficult by conventional amino acid . 
analysis (45, 46). The use of TNBS^ for the determina- 
tion of amino groups (47) and DTNB for the determina- 
tion of thiol groups {48) in intact proteins have also 
achieved special status as a result of their widespread 
use for such purposes. 

By the end of World War II, interest had turned to 
determining particular amino acid residues necessary for 
the biological activities of proteins. That a particular 
amino acid residue in the active site of an enzyme might 
be identified on the basis of its reaction with selective 
chemical reagents was an idea developed during this penod. 
Those interests and further careful scrutiny of the avail- 
able methodology led to the publication of two impor- 
tant reviews of protein modification in 1947 (49, 60), The 
report of Balls and Jansen (51) showing that the inacti- 
vation of several proteases by diisopropyl fluorophos- 
phate resulted from its reaction with a specific serine res- 
idue in each case was another milestone of this period. 

Some of the earliest attempts to use chemical modifi- 
cation procedures to identify particular amino acid resi- 
dues required for the biological activity of a protein were 
conducted in the laboratory of Heinz Fraenkel-Conrat 
(52-54), A few of those procedures are still used, with 
little change, to this day. However, theise earlier studies 
were seriously hampered by the absence of sensitive and 
accurate procedures to determine the number and type(s) 
of amino acid residues undergoing modification and by 
the absence of effective micro and semimicro procedures 
to separate, purify, and characterize products. The stud- 
ies of that period, nevertheless, provided important descrip- 
tions of procedures for use by other investigators and 



^^^^ 



11 



e later development of 



* Abbreviations are as follows: trinitrobenzeneaulfonic acid, 
TNBS; 5,5'-dithiobi8(2-nitrobenzoic acid), DTNB; toaylphen- 
ylalanine chloromethyl ketone, TPCK; dithiothreitol, DTT; 1- 
ethyl-3-l3-(dimethylamino)propyllcarbodiimide, EDC 



served as important steps 
improved procedures. 

Quantitative data on the extent of modification became 
more attainable with the increased availability of radio- 
actively labeled reagents during the 1960s. Greater access 
to automated amino acid analyzers (55) and the devel- 
opment of effective ion-exchange and gel exclusion chro- 
matography media at about the same time also facili- 
tated the characterization of modified proteins, which 
led to a better understanding of many niodification 
reagents and procedures. Various forms of micro gel elec- 
trophoresis also became commonplace in the same decade, 
and these greatly enhanced the ability to monitoi the 
effects of modification on relatively small amounts of pro- 
tein. The advent of an effective procedure for the rou- 
tine determination of amino acid sequences, first described 
by Edman in 1956 (56), was also a major milestone. 
Although often considered routine today, these proce- 
dures were developed only after many years of effort and 
were essential for the characterization of various modi- 
fication procedures. 

SITE-SPECIFIC MODIFICATIONS 

In 1962, Wofsey and co-workers (57) described a selec- 
tive reaction of the p-arsonylbenzenediazonium ion with 
the antigen-combining site of a rabbit anti-p-azoben- 
zenearsonate antibody. This demonstration of affinity 
labeling was followed in about 1 year by the description 
of a highly selective reaction between chymotrypsin and 
a reactive substratelike compound, TPCK (58). The lat- 
ter was shown to effect the modification of a particular 
histidine residue of chymotrypsin with the complete elim- 
ination of its catalytic activity. The selectivity of these 
and other affinity labels results from their resemblance 
to a substrate or ligand. Their strong affinity for a par- 
ticular site concentrates a reactive group, like the chlo- 
romethyl ketone moiety of TPCK, at a specific site, where 
its reaction with a nearby amino acid side chain is pro- 
moted by mutual proximity. Subsequent to these reports, 
a very large number of affinity labeling reagents have 
been described. Affinity labeling is now one of the most 
important methods for identifying amino acid residues 
in enzyme active sites. Table I describes some of the 
most commonly used types of affinity labeling reagents 
and summarizes a few of their salient properties, 

SIDE CHAIN SELECTIVE MODIFICATIONS 

The use of the side chain selective reagents (i.e., those 
which react, under certain specified conditions, with a 
single or, at least, a limited number of side-chain groups 
in a fairly predictable manner) is, however, a simpler 
approach. At least for initial screening, it is stijl widely 
used to identify amino acid chains required for biologi- 
cal activity. Table II contains a list of some of the most 
commonly used and, in the authors' opinions, most use- 
ful group-selective reagents and brief descriptions of some 
of their important properties and applications. 

The retention of biological activity after treatment with 
one of those reagents is usually good a priori evidence 
that the modified amino acid side chains are not required 
for that particular activity. Under appropriate condi- 
tions, each reagent normally reacts only with the indi- 
cated target side chain (s). Depending on the protein, the 
reagent, and the particular conditions, however, com- 
plete modification of all such side chains is not always 
obtained. In most cases, the extent of reaction can be 
determined by either direct spectrophotometric measure- 
ments, amino acid analyses, or the use of radioactive 



12 



Table I. Major Types of Affinity Labels 




PERS 




:S IN BIOCONJUGATE CHEMISTRY 



type 



examples 



target enzymes 



o*halocarbonyl 
RCOCHjX 



epoxide 

n-CH-CH, 

sulfonyl fluoride 

aldehyde 
RCH— O 



azido 

(photoaffinity 
labels) 



TPCK 

3- bromo-2- ketogJu tarate 
chloroacetol sulfate 
1 ,2-anhydromannitol 

6*phosphate 
glycidol phosphate 

5'* I (fluorosuUbnyl)benzoyl] 

•adenosine 
2'^-dialdehydo-ATP 



pyridoxal phosphate 



8-azido-ATP 



5-a2ido-UDP 



chymotrypsin 



isocitrate dehydrogenase 
triose phosphate isomerase 
glucose 6-phosphate 
isomerase 

triose phosphate 

isomerase* enolase 
glutamine synthetase, etc 

pyruvate carboxylase 
adenylate cyclase, etc 



glycogen phosphor>iase, 

gutomine synthetase, 
NA polymerase, etc 
Fl-ATPase 



UDP-glucose, 

pyiophosphorylase 



reaction characteristics 



addition to nudeophilic groups, 

especially His and CystSH), also COO- 



addition to various nudeophilic 
groups, C00-, CysCSH) 



addition to various Dudeophilic eroum. 
Cy8(SH), Lys, His, etc 

synthesised by periodate oxidation of ATP* 

* addition to amino groups especially 
in the presence of NaBH^, oialdehyde 
derivatives of other nucleotides and 
nucleosides may be emph>yed similarly 

reaction with Lys In FLP and phosphate 
binding sites; irreversible, in the 
presence of NaBH^ or NaBHjCCN) 

requires U V irradiation: by addition 
to nucIeophJles and double bonds, 
insertion into C-H and O-H bonds, 
and other reactions 



refs cited 



68 

69 
60 
61 

62 

63 

64,65 



66-68 
69 

70 



Table II. Useful Side Chain Modification Reagents* 
side chain or group 
amino (Lys -f a) 



reagent or procedure 



optimum reaction pH» side chain selectivity, and other comments 

— — — , * ' 



carboxyl 

(Asp + Glu) 



guanidfno 
(Arg) 



imidazole 
(His) 

indole (Trp) 



phenol (Tyr) 



thiol . 

(Cys-SH) 



thioether (Met) 



amidination (ethyl 
acetimidate) 

reductive alkvlation 

(formaldehyde + NaBH^ 
or NaBHgCN) 



acylation (acetic 

anhydride) 
(succinic anhydride) 

trinitrobenzenesulfonate 



water-soluble carbodiimide 
+ nucleophile (EDC 
-f glycine ethyl ester) 

dicarbonyls [2,3-butanedione, 

ghenylglyoxal, and 
> hyd roxy phenyl)glyoxal] 



diethyl pyrocarbonate 

(ethozyformic anhydride) 

N'bromosucciniraide 



2-hydroxy-5-nitrobenzyl 

bromide 
iodination (Ij', 

chloramine T + I", ICl. 
lactoperoxidase + 



and 



jpero 
H,0 



tetranitrom ethane 



carboxymethylation 

(iodo- and bromoacetate and 
iodo- and bromoacetamide) 



N-ethylm^leimid e 



6,5'-d ithiobis (2-nltrobenzoic 
acid) (EUman's reagent) 

oxidation (H2O3) 



refa cited 



pH '"9, no other side chains react, positive charge maintained, other 4,71 
imido esters are available, extent of modification may be determined 
withTNBS 

pH ^9 with NaBH^, pH with NaBHjCN; reaction is much slower 5, 25 

under the latter conditions; no other side chains react; positive charge 
maintained; other aldehydes and reducing agents raay be used; extent 
of modification may be determined by amino acid analysis, the 
incorporation of radiolabel, or with TNBS 
pH ~8 and above, Tyr residues also modified, elimination of positive 72 

charge, extent of modification may be determined with TNBS 
same as above, Tyr residues undergo slow deacylation above pH ~6, 73 

replaces positive charges with negative charges 
pH '^8 and above, also reacts slowly with thiol groups, eliminates positive 47, 74 
charge and introduces large hydrophobic substituent, extent of reaction 
may oe determined spectrophotometrically 
pH —4.6-5, some side reactions with Tyr and thiol groups, other 46, 75 

carbodiimides are available, many other nucleophiks (amines) may be 
used to^ either maintain or alter the charge, extent of reaction may be 
determined by amino acid analysis or from incorporation of radiolabel 
pH ^7 or higher, reaction promoted by borate buffer, no mtyor side 76-79 
reactions; partially reversible upon dialysis, eliminates positive charge, 
extent of reaction can be determined from incorporation of radiolabel 
or by amino acid analysis, other dicarbonyi compounds can also 
be used (i.e., cyclohexanedione, glyoxal, etc). 
pH '^475, aide reactions with Lys kept to minimum by low pH, extent of 80, 81 

modification may be determined by spectrophotometric measurement, 
reversed in the presence of NH3OH 
usually pH ~4 or lower, higher pH values can be used; thiol groups are 82 
rapidly oxidized; IVr and His react more slowly; extent of modtilcation 
may be determined spectrophotometrically or by amino acid analysis 
pH <7.5, slight reaction with thiols, strong visible absorbance, can be used 83, 84 

to determine the extent of reaction 
pH —8 or higher, many different procedures and reagents. His also reacts 18, 85, 86 

but usually to a lesser extent, thiol groups are. rapidly oxidized, both 
mono and diiodo derivatives are formed, the extent of reaction 
can be estimated spectrophotometrically or by amino acid analysis, 
widely used for raaiolobeling of proteins 
pH '^8 or slightly higher, thiol groups are also rapidly oxidized, some 87 
nitration of Trp, extent of reaction may be determined 
spectrophotometrically or by amino acid analysis 
pH ^7 or higher; no effect on other residues under appropriate 88, 89 

conditions; Lys, His, Tyr and Met react slowly with excess reagent and 
long reaction times; extent of reaction may be determined with DTNB, 
by the incorporation of radiolabel, or by amino acid analysis 
pH «'6 or higher, reaction with Lys and His are much slower at pH 7 and 90, 91 
usually of no imj>ortance, the extent of reaction may be determined 
from incorporation of radiolabel or by amino acid analysis 
pH '»7 or higher, no other side chains react, reversible in presence of 48 92 

excess low MW thiol, the extent of modtilcation can be determined ' 
spectrophotometrically 
pH '^1 and higher, thiol groups also react very rapidly, reversed by 93 
treatment with low MW thiols, extent of modification may be 
determined by amino acid analysis after alkaline hydrolysis or by 
carboxymethylation followed acid hydrolysis 

Many useful reagents have not been included due to space limitations. Descriptions of reaction conditions, outcomes and literature 
citations are also breif and incomplete for the same reason. More complete information is available in the references and other sources cited 
elsewhere in this review. 



2. MEANS AND FEENEY Chemica^^^fications of Protelns 



13 



reagents. Indirect determinations can also be obtained 
from the number of unreacted amino acid residues, as 
determined either spectrophotometrically (e.g., amino 
groups by TNBS (47) ox thiol groups by DTNB (45)) or 
by amino acid analysis. The extent of reaction can, of 
course, almost always be increased by the use of more 
vigorous reaction conditions, e.g., longer reaction times, 
larger excesses of reagent, and the presence of urea or 
other denaturing agents. Using more severe conditions, 
however, is usually accompanied by some decrease in side- 
chain selectivity, greater risk of conformational change, 
and, sometimes, other disadvantages. Reaction with other 
than target side chains may be of little importance when 
activities are not affected. 

A major loss of biological activity upon such treat- 
ment is often takep as evidence for the essentiality of 
the group modified. But this interpretation must be made 
with somewhat less conviction, owing to the possibility 
of unrecognized conformational changes or other subtle 
effects that may always accompany the modiHcation of 
a protein. The latter are obviously of less concern when 
fewer side chains are modified and for those modifica- 
tions that effect the least change in the size and charac- 
ter of side chains. Luckily, a reasonable number of reiagents 
are available for some of the more important side chains, 
allowing some discretion as to the nature of the modifi- 
cations that may be effected. Rat liver glycine methyl- 
transferasB, for example, is completely inactivated by reac- 
tion with excess DTNB {94), The inactivated enzyme 
is, however, almost completely reactivated by subse- 
quent treatment with potassium cyanide which, presum- 
ably, brings about the replacement of a relatively large 
and anionic 2-nitro-5-thiobenzoate moiety by a smaller 
cyano group with no formal charge, as follows: 

coo* 

P-S-H ♦ DTNB ■ P-S-S- ^^ ^— NOi — ' 

NTB- CN" NTB' 

P-S-CaN (1) 

A carboxymethyl moiety introduced by reaction with 
iodoacetate is also anionic but intermediate in size and 
effects only a partial loss of activity. The larger groups 
thus appear to block or otherwise perturb the active site, 
although none of the cysteine residues to which they are 
attached are really essential for catalytic activity. 

Similar inactivations have been noted following the addi- 
tion of large or charged groups to the cysteine residues 
of many enzymes that are either not inactivated or are 
only partially inactivated by the addition of smaller groups. 
2-Nitro-5-thiocyanatobenzoic acid can be used to effect 
a direct, single-step addition of cyano moieties to thiol 
groups (95, 96), although its reactions are not quite as 
simple as they might initially seem (97). Another reagent, 
methyl methanethiosulfonate, can be used to attach rel- 
atively smfdl, uncharged thiomethyl groups to cyteine res- 
idues, usually with comparable results (9S), 

As a general rule, modifications that havfe the least effect 
on side-chain character should have the least effect on 
protein structure and properties. Modifications of lysine 
residues that retain their usual cationic charge have, for 
example, generally been found to have relatively little 
effect on the biological activities and other properties of 
many proteins. Complete guanidination of the c-amino 
groups in tuna heart cytochrome c thus has almost no 
effect on its UV-visible spectrum, its redox potential, or 
its activity in a standard succinate oxidase assay system 



(99). The catalytic activity of papain is also essentially 
unaffected by complete guanidination (100), Amidina- 
tion or reductive alkylation of amino groups, both of which 
also retain the cationic charge, are generally preferred 
today, however, as both of those reactions take place under 
milder conditions (4, 5, 25). 

SIDE-CHAIN REACTIVITIES 

The reactivities of side-chain groups in proteins vary 
considerably depending on their locations and the infiu- 
ence of nearby residues with which they interact Under 
appropriate conditions, differences in reactivity can be 
used to characterize the environments of such side- 
chain groups. Kaplan and co-workers (iOi, 102) and oth- 
ers (103, 104) t for example, have developed procedures 
to determine the relative reactivities of certain types of 
side chains from the extent of their reaction with trace 
levels of one of several simple reagents. The intrinsic 
reactivity and pK^ of each reacting group can be deter- 
mined by comparing its reaction to that of a simple model 
compound over a range of pH values. 

For identical side-chain groups at different sequence 
positions, the observed differences in piC^ and reactivity 
are assumed to reflect differences in local environment. 
Side chains that experience a change in environment upon 
the binding of a ligand, complexatlon with another pro- 
tein, a change in redox state, or the like can be identi- 
fied by comparing the extent of their reaction in the two 
different states. This approach has been used primarily 
to evaluate the environments of the nucleophilic side 
chains — amino groups and histidine and tyrosine side 
chains — in proteins (105, 106), 

Different local environments may either suppress or 
enhance the reactivities of individual side-chain groups. 
Unusually reactive side chains are usually relatively easy 
to distinguish from others on the basis of their reactiv- 
ity and are, in many cases, also those required for bio- 
logical activity. Hates of inactivation, which may differ 
from overall rates of modification, can be used in many 
cases to characterize the reactivity and, sometimes, the 
number of active site residues (107-109), 

In many relatively simple cases, rates of inactivation 
can be correlated with those for the modification of one 
or more individual amino acid residues. The catalytic 
subunit of rabbit muscle cAMP -dependent protein kinase, 
for example, has only two thiol groups, and undergoes a 
biphasic reaction with DTNB (110), Its rapid inactiva- 
tion under those conditions correlates with the initial, 
rapid phase of modification, which has been shown to 
reflect the reaction of one thiol group about 17 times faster 
than the other. In this and other cases where rates of 
inactivation exceed overall rates of modification, selec- 
tively labeled derivatives, modified only at the active site, 
can often be isolated and characterized (111-113), 

Activities remaining at various stages of partial mod- 
ification can also be used, in some cases, to estimate the 
number of essential residues according to a procedure 
first described by Tsou in 1962 (114), The decreased iron- 
binding capacity of chicken egg white ovotransferrin after 
partial modification by phenylglyoxal, for example, sug- 
gests an arginine residue is required for each of its two 
bound Fe^"^ ions (76), In the more complicated case of 
transketolase, two arginine residues per dimer appear to 
be required for activity, but one appears to react with 
phenylglyoxal about 40 times faster than the other (115). 

SPECTROSCOPIC AND FLUORESCENT LABELS 

A number of important procedures requiring the incor- 
poration of spectroscopic or fluorescent labels have been 



14 




developed to characterize certain structural features of 
proteins. Fluorescence lifetimes and quantum yields of 
many different fluorescent groups and their sensitivities 
to quenching by acrylamide, iodtde, and other sub- 
stances caHi for example, be used to evaluate environ- 
ments in the vicinity of residues to which those groups 
have been attached {15,116). Fluorescence energy trans- 
fer measurements are also widely employed to estimate 
distances between certain internal, or intrinsic, chro- 
mophores and various selectively introduced, extrinsic, 
fluorescent labels and, in som6 cases, between selec- 
tively introduced," extrinsic, donor-acceptor pairs (117, 
118). lodoacetamidofluorescein, dansyl chloride, and N- 
l-pyrenylmaleimide are three examples from a very large 
number of fluorescent labels that have been used for such 
purposes. Most may be considered to be analogues of 
commonly used group-selective reagents and their reac- 
tion characteristics may be predicted accordingly. 

An extensive list of such reagents, with brief descrip- 
tions of their principal reaction and emission and exci- 
tation characteristics, has been presented by Haugland 
(119). Procedures to attach nitroxide moieties, for exam- 
• pie the reaction of 4-(2,2,6,6-tetramethyl-l-oxypiperidin- 
4-yl)-2-(fluo^osulfonyl)benzamide with chymotrjT^sin, have 
also been employed to obtain information concerning the 
protein environment and to detect conformational changes 
by EPR spectroscopy (17, 120). 

CROSS-LINKING AND IMMOBILIZATION 

Cross-linking of proteins and their immobilization, either 
by attachment to an insoluble support or by various other 
means, have a long and important history. The former 
is sometimes employed to increase the stability of pro- 
teins or of certain conformational relationships in pro- 
teins, to couple two or more different proteins (e.g., to 
join different activities into a single molecule), to iden- 
tify or characterize the nature and extent of certain pro- 
tein-protein interactions, and, in other cases, to deter- 
mine distances between reactive groups in or between 
protein subunits (36, 37, 121-125), Proteins are some- 
times immobilized to facilitate their reuse and their sep- 
aration from other products and (in some cases) to increase 
their stability. A large number of different procedures, 
including physical as well as chemical procedures, have 
been developed to immobilize proteins, and many reviews, 
symposia proceedings, and books on this subject are avail- 
able (126-130). 

A large number of different types of cross-linking or, 
as they are sometimes called, bifunctional reagents have 
been described. They include so-called zero-length cross- 
linking agents that bring about the direct formation of 
covalent bonds between existing amino acid side chain 
groups. The use of water-soluble carbodiimides to bring 
about the formation of amide linkages between carboxyl 
groups of aspartate or glutamate and the c-amino groups 
of lysine side chains appear to be the most prominent 
zero-length cross-linking agents (123, 131-133). Disul- 
fide bonds obtained from existing thiol groups would also, 
presumably, be considered zero-length cross-links (134, 
135), Such linkages appear to be formed only when the 
reacting groups are in close proximity. 

Other cross-linking agents may be organized accord- 
ing to the type(s) of reactive groups, their side chain reac- 
tivity, their hydrophobicity or hydrophilicity, and the 
length or distance between the reactive groups; whether 
the two, or in some cases more (136), reactive groups are 
the same or different (i.e., "homobifunctionaP or '^het- 
erobifunctional" reagents), whether the structure con- 




S IN BIOCONJUQATE CHEMISTRY 



necdng the reactive groups is readily cleavable, and whether 
the groups are membrane permeable or impermeable, and 
according to various other criteria. A list of the most 
widely used types of cross-linking agents and a few brief 
comments on some of their significant properties are pre- 
sented in Table III, A much more extensive list of cross- 
linking agents has been presented by Ji (125). 

The reactivities of cross-linking agents, except for one 
or two special cases, are very similar to those of the cor- 
responding monofunctional reagents. The initial reac- 
tion with a protein is presumably, in most cases, a sim- 
ple second-order process, not seriously affected by the 
second reactive group. The latter*s reaction, however, is 
completely dependent on the availability of a second appro- 
priate side chain which, for fast, efficient cross-linking, 
must be both nearby and in an appropriate orientation. 
Cross-linking agents with different lengths, different ste- 
reochemical configurations (some with Utile and others 
with a great deal of conformational flexibility), and with 
different side-chain specificities have been developed to 
fulfill different needs. Distances between potentially reac- 
tive side chains in the same or different subunits of some 
oligomeric proteins have, for example, been estimated by 
comparing rates and yields of cross-link formation with 
a series of cross-linking agents differing in length, stere- 
ochemical configuration, and side-chain reactivity (139, 
155, 146). 

The importance of side-chain proximity in these reac- 
tions is perhaps most evident in the case of cross-link- 
ing agents that undergo hydrolysis or some other inacti- 
vation process in addition to their cross-linking of pro- 
teins. The use of bifunctional imidoesters to characterize 
oligomeric proteins, for example, is based on the forma- 
tion of recognizable SDS gel electrophoretic patterns, 
reflecting the formation of cross-links between adjacent 
subunits (139, 138). Like the cross-links within a sub- 
unit, those between subunits are formed only when two 
amino groups are in close and appropriate proximity. Cross- 
links between other than adjacent subunits are largely 
precluded by the hydrolytic instability of the monofunc- 
tional imidoester intermediates. The importance of hydro- 
lytic stability on yields of cross-linked products has been 
discussed by Staros (37, 156). 

Of the 20 or so amino acid side chains normally present 
in proteins, e-amino groups of lysine residues are usually 
among the most abundant and most accessible of the 
potentially reactive groups. A relatively large propor- 
tion of the most commonly used cross-linking agents are 
therefore amino group selective reagents (i.e., imi- 
doesters, /V-hydroxysuccinimide esters, activated aryl flu- 
orides, etc.). Most of them, however, also undergo fairly 
rapid hydrolysis in addition to their reaction with amino 
groups, which, except for cases involving close proxim- 
ity, seriously limits the yields that may be obtained. Glu- 
taraldehyde, which does not hydrolyze or become other^ 
wise inactivated over long periods of time, is widely used 
to immobilize enzymes by cross-linking and to stabilize 
their adsorption to or entrapment in various materials 
(157, 158). The nature of its reactions with proteins may 
involve some Schiff base formation but is clearly much 
more complicated than that and not completely under- 
stood (137, 159, 160). 

The high reactivities of thiol groups with iV-ethylma- 
leimide, iodoacetate, and many related a-halocarbonyl 
compounds has led to the development of many cross- 
linking agents containing comparable maleimide and a- 
halocarbonyl moieties. Under the conditions usually 
employed for cross-linking, the latter are much more sta- 



2. MEANS AND FEENEY Chomica 




fications of Proteins 




15 



Table III. Homobifunctional and Heterobifunctional Protein Cross-Linking Agents' 



agent 



description 



refs cited 



glutaraldehyde 



dimethyl Buberlmidate (DM5) 



diauccinimidyl suberate (DSS) 



bismaletmidohexane (BMH) 
p-phenylenexnaleimide 



m-maleimidobenzoic acid 

W-hydrMysuccinimidB ester (MBS) 



N-8uccinimidyl 4-(7V-maleimidomethyl)- 
cyclohcxane-l-carboxylate (SMCC) 

N-succinimidyl 

3.(2.pyridyldithio)propionate (SPDP) 



2-inimoth)olane ("Traiit's reagent") 



Homobifunctiotial 
available aa 25% aqueous solution^ very effective reaction with amino 
groups and perhaps other nucleophi tic groups, contains polymer tc and 
other unknown materials, Ute nature of the reaction (s) are not known, 
alow progressive changes proceed long after the initial irreversible 
coupling 

a water*8oluble solid; reacts only with amino groups and does not 

•eliminate their cationic charge; reaction at pH 8 or above (optimal at 
pH f-S); t^yj ««• 46 min at pH 8.6 and 26 'C; -*-ll-A span; many related 
reagents with diHerent spans, some readily cleavable, are available or 
can b« easily synthestxeo 

a water-tnaoluble solid; must usually be dissolved in DM30 or other 
watar-mbcible organic solvent reacts with amino groups at pH 7 or 
above; reaction rates increase with pH; ti/a "* 4-6 n at pH 7; — ll-A 
span; many related reagents with different spans; hydrophilic spacer 
arms, some cleavable and water-soluble; sulfosuccinimide esters are 
available 

a water-insoluble solid, must usually be dissolved in DMP or other 
water-mi&cible organic liquid, reacts mth thiol groups at pH ~6-S; 

span; many related reagents with different span lengths; more 
hydrophilic spacer arms and cleavable analogs are avilable 

a water^nsoluble solid, must usually be dissolved in water-miscible organic 
solvent, reacts with thiol groups at pH '-G-S; 12- A span, ortho and 
meta isomer are also available, less stable than aliphatic maleimides 

Heterobifunctional 

a water-insoluble solid, must usually be dissolved in water-miscible organic 
liquid, initial reaction with amino group component at pH ^^-rS 
followed by coupling with thiol component at pH '*'6-fl, -^lO-A span, 
more water soluble sulfosuccinimide ester is also available 

a water-insoluble solid, must usually be dissolved in water-mlsctble organic 
solvent, reaction characteristics very similar to those of MBS. p'12-A 
span, more water soluble sulfosuccinimide ester is also available 

a water-insoluble solid, must usually be dissolved In a water-miscible 
organic solvent, initial reaction with the amino component at pH 
'-7-8.5 followed by either coupling to thiol component at pH 7 or 
above or treatment with DTT followed by couplmg to maleimidylated 
protein, '*'7-A span 

a water-soluble solid; reacts only with amino groups at pH 7-10 without 
eliminating their charge; reaction may be followed with DTNB; --fr-A 
span; may be coupled directly to MBS-, SMCC- or SPDP-trcated 
proteins 



137 



138, 139 



140, 141 



142, 143 
144-146 

147, 148 

149. 150 
161, 162 

153, 154 



« Many more cross-linking agents have been described. Those included appear to be among the most widely used and most important at 
the present time, Pleast consult references in the text for additional examples. 



ble to hydrolysis than the amino group reagents men- 
tioned above and the yields of cross-linked products are, 
therefore, usually somewhat less dependent on side chain 
proximity (161 ^ 162), 

A large number of heterobifunctional cross-linking 
reagents have been developed which usually contain a 
thiol reactive and an amino group reactive moiety. N- 
Alkyl- or N-arylmaleimide and a-halocarbonyl groups are 
the most common of the former and iV-hydroxysuccin- 
imide esters appear to be the most common of the lat- 
ter. To increase aqueous solubility, sodium salts of sul- 
fonated N-hydroxysuccinimide esters are also com- 
monly employed {163}. In addition to the two reactive 
groups a variety of different types of connecting struc- 
tures or spacer arms have been employed. The nature 
of the spacer arm may, of course, also have important 
consequences. Longer spacer arms are usually assumed 
to be more effective for coupling larger proteins or those 
where the potentially reactive side chains are sterically 
protected. The conformational flexibility, hydrophilic- 
ity or hydrophobicity, and the "cleavability** of the spacer 
arm are also important considerations. iV-Alkylmaleim- 
ides are also generally more stable than their aryl coun- 
terparts {162, 164), 

Photoactivatable heterobifunctional cross-linking agents 
are particularly useful for identifying interacting compo- 
nents in complicated biological systems (165), Wood and 
O'Dorisio {166), for example, used N-succinimidyl 4-azi- 
dobenzoate, N-succinimidyl 6-[(4'-azido-2'-nitrophenyl)- 
aminojhexanoate and two nonphotoactivatable homobi- 
functional cross-linking agents to identify vasoactive intes- 
tinal peptide receptors in human lymphoblasts by their 
coupling to ^^^I-labeled vasoactive intestinal peptide. A 



photoactive derivative of a N-formylated chemotactic pep- 
tidei prepared by reaction with the last mentioned pho- 
toactivatable agent, has also been used to characterize 
the iV-formyl peptide receptors of human polymorpho- 
nuclear leukocytes (167). 

The initial reaction with photoactivatable cross-link- 
ing agents is usually conducted in the dark so that the 
photoreactive group is inert. Cross-linking is then initi- 
ated in a subsequent step involving exposure to light. 
Azido groups which are converted into a highly reactive 
nitrenes and diazo moieties (i.e., diazoacetyl, diazo ketones, 
etc.) which give even more reactive carbenes upon pho- 
toactivation are the most common photoactivatable groups 
in use at this time (2, 3). Being so reactive, both react 
relatively indiscriminately with OH, NH, CH, and C=C 
moieties in their vicinity and have short half-lives. Their 
reaction with surrounding solvent usually precludes reac- 
tion with groups not in their immediate vicinity and leads 
to quite low yields. The detection of cross-linked prod- 
ucts thus often provides a good record of spatial relation- 
ships at the moment of photolysis but the yields are not 
adequate for most preparative purposes. 

Heterobifunctional cross-linking agents are particu- 
larly useful for conjugating different proteins. The dif- 
ferent side-chain reactivities of the two reactive groups, 
for example, usually permit the coupling to be carried 
out in a stepwise manner which allows, in some cases, 
for partial purification and, if desired, characterization 
of intermediates prior to the actual conjugation. Due to 
the hydrolytic instability of the most important groups 
directed at amino side chains, the first step usually inv6lves 
addition of the cross-linker to the amino groups of one 
member of the future hybrid pair (which either has no 



16 



pers^^Btes in bioconjugate chemistry 



thiol groups or where thiols, if present, are at least tem- 
porarily blocked). The removal of unreacted or hydro- 
lyzed reagent and other unwanted substances is usually 
possible at this stage. The resulting derivative is then 
directly coupled via the introduced thiol-reactive male- 
imido or a-halocarbonyl group(s) to the thiol-containing 
member of the intended hybrid pair. 

An artificial antibody-ricin conjugate, for example, has 
been prepared by treating ricin with m-maleimidoben- 
zoyl iV-hydroxysuccinimide ester and then incubating the 
resulting m-maleimidobenzoyl derivative with a par- 
tially reduced monoclonal antibody (248). The forma- 
tion of unwanted homoprotein conjugates is precluded 
by such two-step procedures, and purification of the result- 
ing hybrid conjugates by exclusion chromatography is usu- 
ally rather easy since they should be significantly larger 
than any of their precursors. lodoacetyl derivatives of 
avidin, alkaline phosphatase, and at least fotur other pro- 
teins are commercially available. 

Several reagents have been employed to introduce thiol 
groups into proteins, which may then be employed for 
conjugation to other proteins or various other materials. 
7V-Acetylhomocysteine thiolactone (168), (S-acetyl- 
thio)succinic anhydride (169), S-acetyl .V-succinimidyl- 
thioacetate (170), 2-iminothiolane (153), and Msuccin- 
imidyl 3-(2-pyridyldithio)propionate (151), for example, 
can all be used under mildly alkaline conditions to intro- 
duce thiol groups into proteins. In the second and third 
cases, the acetyl moiety must subsequently be removed, 
usually by treatment with hydroxylamine, to release the 
thiol group and, in the last case, a small amount of DTT 
or some other simple thiol must be used to affect a com- 
parable cleavage of the 2-pyridyl disulfide moiety. The 
resulting thiol groups potentially can be coupled to many 
different maleimidyl or a-halocarbonyl groups includ- 
ing, for example, those of certain protein-maleimidyl con- 
jugates as follows (171, 150): 



P-NH, 



H 



DTT 



H 



P-N 




X (2) 



Even more important, probably, is the ability of the lat- 
ter substituent to undergo direct coupling with the thiol 
groups of other proteins as follows (152, 172): 



o 

P-N'^^^'^S-S-^^ ♦ P'-S-H 



(3) 



Several 2-pyridyl disulfide-protein conjugates are com- 
mercially available. The susceptibility of disulfide link- 
ages to cleavage by low molecular weight thiols, how- 
ever, appears to preclude many applications of such con- 



jugates, including most of those involving exposure to 
physiological conditions. 

2-Iminothiolane is probably the most important reagent 
for introducing thiol groups into proteins. It is quite water 
soluble, whereas the others really are not, it reacts rap- 
idly with amino groups at pH 7 (or preferably a little 
above), and it does not require an additional activation 
step to effect release of the thiol moiety. It alone pre- 
serves the cationic charges of the modified amino groups. 
As with the other reagents used to introduce thiol groups, 
those introduced via reaction with 2-iminothiolahe can 
be used to effect oxidative coupling to other protein thi- 
ols or may react with various maleimidyl or a-halocar- 
bonyl groups, as follows (173, 154): 




S-H 




P' (4) 



CONCLUSION 

Space and time limitations have precluded the discus- 
sion of many important related subjects. We had hoped, 
in particular, to discuss the radiolabeling of proteins. Biot- 
inylation also deserves serious discussion. We apologize 
to the many authors whose works we have failed to cite 
and particularly to those whose results we may have mis- 
interpreted or misrepresented. We would also like to call 
the readers' attention to a number of reviews and books 
on this subject, where more complete information can 
be obtained (174-183). 

ACKNOWLEDGMENT 

Financial support to GEM was received from Solar 
Energy Research Institute and to REF from U.S. National 
Institutes of Health Grant GM23817. The assistance of 
Shirley Miller in preparing the manuscript is greatly appre- 
ciated. 

LITERATURE CITED 

(1) Colman, R. F. (1983) Affinity labeling of purine nucleotide 
sites of proteins. Annu, Rev. Biochem. 52, 67-91. 

(2) Bayley, H. (1983) Photogenerated Reagents in Biochemis- 
try and Molecular Biology Elsevier, New York. 

(3) Knowles, J. R. (1972) Photogenerated reagents for biologi- 
cal receptor-site labels. Acc. Chem. Res. 5, 156-160. 

(4) Hunter, M. J. and Ludwig, M. L. (1962) The reaction of 
imidoesters with proteins and related small molecules. J. 
Am, Chem. Soc. S4, 3491-3504, 

(5) Means, G. E. and Feeney, R. E, (1968) Reductive alkyla- 
tion of amino groups in proteins. Biochemistry 7. 1366- 
1371. 

(6) Goldstein, L., Leven, Y., and Katchalski. E. (1964) A water- 
insoluble polyanionic derivative of trypsin. II. Effect of the 
polyelectrolyte carrier on the kinetic behavior of bound trypsin. 
Biochemistry 3. 1913-1919. 

(7) Nishikawa, A. H., Morita, R. Y., and Becker, R. R. (1968) 
Effects of the solvent mediura'on polyvajylribonuclease aggre- 
gation. Biochemistry 7, 1606-1513. 

(8) Ampon, K. and Means. G. E. (1988) Immobilization of pro- 
teins on organic polymer beads. Biotechnol. Bioeng. 32, 689- 
697. 

(9) Abuchowski, A., van Es. T, Palczuk, N. C, and Davis, F. 
F. (1977) Alteration of immunological properties of bovine 
serum albumin by covalent attachment of polyethylene gly- 
col. J. Biol. Chem, 252, 3678-3581. 



2. MEANS AND FEENEY Chemical ^^ffcations of Proteins 



17 



(10) Veronese, P. M., LargaloUi. R.. Boccu, B., Bengassi, C. A., 
and Schiavon, O. (1985) Surface modification of proteins. Acti- 
vation of monomethozy-polyethylene glycols by phenylchlo- 
roformates and modification of ribohuclease and superoxide 
dismutase. AppL Biochem. Biotechnol 11, 141-152, 

(11) Raftery, M A. and Ck)le, R. D. (1966) On the aminoethy- 
lation of proteins. J. Biol. Chem. 241, 3457-3461. 

(12) Dixon, H. B. F. and Perham, R, N. (1968) Reversible block- 
ing of aniino. groups with citraconic anhydride. Biochem. J. 
109. 312-314. 

(13) Rice, R. H., Means, G. E., and Brown, W. D. (1977) Sta- 
bilization of bovine trypsin by reductive methylation. Bio- 
chem. Biophys. Acta 492, 316-321. 

(14) Parkinson, D. and Redshaw. J. D. (1984) Visible labeling 
of proteins for polyacrylamide gel electrophoresis with dab- 
syl chloride. Anal. Biochem. 141, 121-136. 

(15) Hudson, E,-N. and Weber, G. (1973) Synthesis and char- 
acterization of two41uorescence sulfhyd^l reagents. Bio- 
chemistry 12, 4154-4161. 

(16) Weltman, J. K., Szaro, R. P., Frackelton, A. R-, Dowben, 
R. M., Bunting, J. R.. and Cathow, R, E. (1973) iV-(3-Py- 
rene)maleimide: a long lifetime fluorescent sulfhydryl reagent 
J. Biol. Chem. 248, 3173-3177. 

(17) Berliner, L. J. and Wong, S. S. (1974) Spin-labeled sulfo- 
nyl fluorides as active site probes of protease structure. «/.- 
Biol. Chem. 249, 1668-1682. 

(18) Hunter, W. M. and Greenwood, F. C. (1962) Preparation 
of 1-131 labelled human growth hormone of high specific activ- 
ity. Nature 194, 492-496, 

(19) Rice, R. H. and Means, G, E. (1971) Radioactive labeling 
of protein in vitro. J. Biol. Chem. 246, 831-832. 

(20) Bolton, A. E. and Hunter. W. M. (1973) The labelling of 
proteins to high specific radioactivities by conjugation to a 
*^*I-containing acylating agent. Biochem, J. 133, 529-539. 

(21) Meares, C, F., McCall, M. J., Deshpande, S. V., DeNardo; 
S. J., and Goodwin, D. A. (1988) Chelate radiochemistry: Cleav- 
able linkers lead to altered levels of radioactivity in the liver. 
Int. J. Cancer 2, 99-102. 

(22) Langer, R. and Brown, E. (1985) Controlled release and 
magnetically modulated release systems for macromole- 
cules. Methods Enzymol. 1 12, 399-422. 

(23) Senyei, D. and Widder, K, J. (1985) Biophysical drug tar- 
geting; Magnetically responsive albumin microspheres. 
Methods EnzymoL 112. 56-67. 

(24) Petsko, G. A. (1985) Preparation of heavy-atom deriva- 
tives. Methods Enzymol. 114, 147-156. 

(25) Jentoft, N. and Dearborn, D. G. (1979) Labeling of pro- 
teins by reductive methylation using sodium cyanoborohy- 
dride. J. BioL Chem. 254, 4359-4365. 

(26) Maekawa. K. and Liener, I. IS. (1960) Properties of the 
glucosylamidyl derivative of trypsin. Arch, Biochem. Bio- 
phys. 91, 101-107. 

(27) Lee, H. S., Sen, L. C, Clifford, A. J., Whitaker, J. R.. and 
Feeney, R. E. (1979) Preparation and nutritional properties 
of caseins covalently modified with sugars. Reductive alky- 
lation of lysines with glucose, fructose or lactose. J. Agric. 
Food Chem. 27 i 1094-1098. 

(28) Chen, V. J. and Wold, F, (1984) Neoglycoproteins: prepa- 
ration of noncovalent glycoproteins through high-affinity pro- 
tein- (glycosyl) ligand complexes. Biochemistry 23, 3306- 
3311. 

(29) Wong, W. S. D., Kristjansson, M. M., Osuga, D. T.. and 
Feeney, R, E. (1985) 1-DeoxyglycitoIation of protein amino 
groups and their, regeneration by periodate oxidation. Int. 
J. Peptide Protein Res. 26, 55-62. 

(30) Hofmann, K., Titus, G., Mon libeller, J. A., and Finn, F. 
M. (1982) Avidin binding of carboxyl-substituted biotin ana- 
logues. Biochemistry 21, 978-984. 

(31) Wilchek, E, A. and Bayer, E. A. (1988) The avidin^biotin 
complex in bioanalytical applications. Anal. Biochem. 171. 
1-32. 

(32) Gross, E. (1967) The cyanogen bromide reaction. Meth- 
ods EnzymoL 11, 238-255. 

(33) Mahoney, W. C. and Hermodson, M. A. (1979) High-yield 



cleavage of tryptophan peptide bonds by o-iodosobenzoic acid. 
Biochemistry 18, 3810-3814. 

(34) Kaiser, E. T., Lawrence, D. S., and Rokita, S. E. (1986) 
The chemical modification of enzymatic specificity. Annu. 
Rev. Biochem, 54, 565-595. 

(35) Osuga, D. T., Feather, M. S., Shah, M. J., and Feeney. R 
E. (1989X >Iodification of galactose and N-acetylgalac- 
tosamine residues by oxidation of C-6 hydroxyls to the alde- 
hydes followed by reductive amination: Model systems and 
antifreeze glycoproteins. J. Protein Chem. 8, 519-528. 

(36) Han, K.-K., Richard, C, and Delacorte, A, (1984) Chemi- 
cal cross-links . of proteins by using bifunctional reagents. 
Int, J. Biochem. 16, 129-145. 

(37) Staros, J. V, (1988) Me mbrane-imper meant cross-linking 
reagents: Probes of structure and dynamics of membrane pro- 
teins. Acc. Chem, Res. 21, 435-441. 

(38) Poznansky,.M. J. (1986) Tailoring Proteins for More Effec- 
tive Use as Therapeutic Agents. In Protein Tailoring for Fooii 
and Medical Uses (R. E. Feeney and J. R. Whitaker, Eds.) 
pp 317-337, Marcel Dekker, New York. 

(39) Pomansky. M. (1988) Soluble enzyme conjugates: New pos- 
sibilities for enzyme replacement therapy. Methods Enzy- 
moL 137. 566-674. 

(40) Bode, C, Runge, M. S., Newell. J, B.. Matsueda, G. R., 
and Haber. E. (1987) Characterization of an antibody-uroki- 
nase conjugate, a plasminogen activator targeted to fibrin: . 
J. Biol, Chem. 262, 10819-10823. 

(41) Beyzavi, K., Hampton, S., Kwasowski. P., Fickling, S., Marks, 
v., and Clift, R. (1987) Comparison of horseradish peroxi- 
dase and alkaline phosphatase-Iabelled antibodies in enzyme 
immunoassays. Ann. Clin. Biochem. 24, 145-152. 

(42) Cumber. A. J., Forrester, J. A., Foxwell, B, M. J., Ross, W. 
C. J., and Thorpe, P. E. (1985) Preparation of antibody- 
toxin conjugates. Methods Enzymol. 112, 207-225. 

(43) Faulstich, H. and Fiume, L. (1985) Protein conjugates of 
fungal toxins. Methods EnzymoL 112, 226-237. 

(44) Urdahl, D. L. and Hakomori, S. (1980) Tumor-associated 
ganglio-N-triosylceramide target for antibody dependent, avi- 
din mediated drug killing of tumor cells. J. BioL Chem. 25'5, 
10509-10516. 

(46) Hoare, D. G. and Koshland, D. E. (1967) A method for the 
quantitative modification and estimation of carboxylic acid 
groups in proteins. J. BioL Chem. 242, 2447-2453, 

(46) Barman, T. E. and Koshland, D. E. (1967) A colorimetric 
-• procedure for the quantitative determination of tryptophan 

residues in protein. J, BioL Chem. 242, 5771-5776. 

(47) Fields, R, (1972) The rapid determination of amino groups 
with TNBS, Methods EnzymoL 25, 464-468. 

(48) Ellman, G. L. (1959) Tissue sulfhydryl groups. Arch, Bio- 
chem. Biophys. 82, 70-77. 

(49) Olcott, H. S. and Fraenkel-Conrat, H. (1947) Specific group 
reagents for proteins. Chem. Rev. 41, 151-197. 

(50) Herriott, R. M, (1947) Reactions of native proteins with 
chemical reagents. Adv. Protein Chem. 3, 161-225. 

(51) Balls. A. K. and Jansen, E. F. (1952) Stoichiometric inhi- 
bition of chymo trypsin. Ado, EnzymoL 13, 321-343. 

(52) Fraenkel-Conrat, H., Bean, R. S., and Lineweaver, H. (1949) 
Essential groups for the interaction of ovomucoid (egg white 
trypsin inhibitor) and trypsin, and for tryptic activity. J, 
BioL Chem. 177, 385-403. 

(53) Fraenkel-Conrat, H. and Olcott, H. S. (1948) The reaction 
of formaldehyde with proteins. V. Cross-linking between 
amino and primary amide or guanidyl groups. J. Am. Chem. 
Soc. 70, 2673-2684. 

(54) Fraenkel-Conrat, H. and Feeney, R. E. (1950) The metal- 
binding activity of conalbumin. Arch, Biochem. 29. 101- 
113. 

(55) Moore, S. and Stein, W. H. (1963) Chromatographic deter- 
mination of amino acids by the use of automatic recording 
equipment. Methods Enzymol, 6, 819-831. 

(56) Edman, P. and Begg, G. (1967) A protein sequenator. Eur. 
J. Biochem. 1, 80-91. 

(57) Wofsy, L.. Metzger, H., and Singer. S. J. (1962) Affinity 
labeling — A general method for labeling the active sites of 



18 



pers^^Bes in bioconjugate chemistry 



antibody and enzyme molecules. Biochemistry 1, 1031- 
1039. 

(58) SchoeUmanh,.G. and Shaw, E. (1963) Direct evidence for 
the presence of histidine in the active center of chymot- 
rypsin. Biochemistry 2, 252-256. 

(59) Ehrlich, R. S. and Colman, R, F. (1987) Characterization 
of an active site peptide modified by the substrate analogue 
3-broino-2-ketogIutarate on a single chain of dimeric 
NADP* dependent isocitrate dehydrogenose. J. Biol. 
Chem, 262, 12614-12619. 

(60) Hartman, F. C, LaMuraglia, C. M., Tomozawa, Y., and 
Wolfendeni R. (1976) The influence of pH on the interaction 
of inhibitors with triosephosphate isomerase and determina- 
tibn of the piC, of the active-site carboxyl group. Biochem- 
istry U, 5274-5291. 

(61) O^Connell, E. L. and Rose, I. H. (1973) Affinity labeling 
of phosphoglucose isomerase by l,2-anhydroihexitol-6-phos- 
phates. J, Biol. Chem. 248. 2226-2231. 

(62) Schray, K. J., 0*Connell, E. L., and Rose, I. A. (1973) Inac- 
tivation of muscle triose phosphate isomerase by d- and l- 
glycidolphosphate. J, BioL Chem. 248, 2214-2218. 

(63) Pinkofsky, H, D., Ginsburg, A., Reardon, J., and Heinrik- 
son, R. L, (1984) Lysyl residue 47 is near the subunit ATP- 
binding site of glutamine synthetase from Escherichia coU. 
J. BioL Chem. 259, 9616-9622. 

(64) Easterbrook-Smith, B., Wallace, J. C, and Keech, D. B. 
(1976) Pyruvate carboxylase: Affinity labelling of the mag- 
nesium adenosine triphosphase binding site. Eur. J. Bio- 
chem. 62, 125-130. 

(65) Wescott, K. R,..01win, B. B., and Storm, D. P. (1980) Inhi- 
bition of adenylate cyclase.by the 2'-3'-dialdehyde of adenos- 
ine triphosphate. J. BioL Chem. 255, 8767-8776. - 

(66) Fischer, E. H., Kent, B. B., Snyder, E. R.. and Krebs. E. 
G. (1958) The reaction of sodium borohydride with muscle 
phosphorylase. J. Am. Chem. Soc. 80, 2906-2907. 

(67) Dilanni, C. L. and Villafranca, J. J. (1989) Identification 
of amino acid residues modified by pyridoxal 5'-phosphate 
in Escherichia coli glutamine synthetase: J. Biol. Chem. 
264, 8686-8691. 

(68) Basu, A., Kedar, P., Wilson, S., and Modek, M. J. (1989) 
Active-site modification of mammalian DNA polymerase P 
with pyridoxal 5'-phosphate. Mechanism of inhibition and 
identification of lysine 71 in the deoxynucleoside triphos- 
phate binding pocket. Biochemistry 28, 6305-6309. 

(69) Hollemans, M., RunsWick, M. J., Fearnley, L M., and Walker, 
J. E. (1983) The sites of labeling of the /3-Subunit of bovine 
mitochondrial Fl-ATPase with 8-azido-ATP. J, BioL 
Chem. 258, 9307-9313. 

(70) Drake, R. D., Evans, R. K., Wolf, M. J., Haley. B. E. (1989) 
Synthesis and properties of S-azido-UDf-glucose. J. BioL 
Chem. 264, 11923-11933, 

(71) Wallace, C. J. A. and Harris, D, E, (1984) The preparation 
of fully N-<-acetimidylated cytochrome c. Biochem. J. 217, 
689-694. 

(72) Grossberg, A. L. and Pressman, D. (1963) Effect of acety- 
lation on the active site of several antihapten antibodies: Fur- 
ther evidence for the presence of tyrosine in each site. Bio- 
chemistry 2, 90-96. 

(73) Buttkus, H.. Clark, J. R., and Feeney, R. E. (1965) Chem- 
ical modifications of amino groups of transferrins: Ovotrans- 
ferrin, human serum transferrin and human lactotransfer- 
rin. Biochemistry 4, 998-1005. 

(74) Haynes, R., Oauga, D. T., and Feeney, R. E. (1967) Mod- 
ification of amino groups in inhibitors of proteolytic enzymes. 
Biochemistry 6, 541-547, 

(76) Huynh, Q. K. (1988) Evidence for a reactive a-carboxyl 
group (Glu-418) at the herbicide glyphosate binding site of 
5-enDlpyruvyl3hikimate-3-phosphate synthase from Escher- 
ichia coU. J. BioL Chem. 263. 11631-11636. 

(76) Rogers, T. B., Borresen, T., and Feeney. R. E. (1978) Chem- 
ical modification of the arginines in transferrins. Biochem- 
istry 17, 1105-1109. 

(77) Riordan, J. E. (1973) Functional arginine residues in car- 



boxypeptidase A — modification with butanedione. Biochem- 
istry 12, 3915-3923. 

(78) Yamasaki, R. B., Vega, A., and Feeney. R, E. (1980) Mod- 
ification of available arginine residues in proteins by p-hy- 
droxyphenylglyoxal. AnaL Biochem. 109, 32-40. 

(79) Kasher, J. S., Allen, K. E., Kasamo, K., and Slayman, C. 
W. (1986) Characterization of an essential arginine residue 
in the plasma membrane H^-ATPase of Neurospora crassa. 
J. BioL Chem. 261, 10808-10813. 

(80) Melchior. W. B. and Fahmey, D. (1970) Ethoxyformyla- 
tion of proteins. Reaction of ethoxyformic anhydride with 
o-chymotrypsin, pepsin and pancreatic ribonuclease at pH 
4. Biochemistry 9, 251-258. 

(81) Dominici, P,, Tancini. B., and Voltattomi, C. B. (1985) 
Chemical modification of pig kidney 3,4-dihydroxyphenyla- 
lanine decarboxylase with diethyl pyrocarbohate. J. BioL 
Chem. 260, 10583-10589. 

(82) Spande, T. F. and Witkop. B. (1967) Tryptophan involve- 
ment in the function of enzymes and protein hormones as 
determined by selective oxidation with TV-bromosucinimide. 
Methods EnzymoL 11, 606-521. 

(83) Horton, H. R, and Koshland„D. E. (1972) Modification of 
proteins with active benzylhalidea. Methods EnzymoL 25, 
468-482. 

(84) Horton, H. R. and Koshland, D. E. (1967) Reactions with 
reactive alkylhalides. Methods EnzymoL 11, 556-565. 

(85) Morrison, M. (1970) lodination of tyrosine: Isolation of lac- 
toperoxidase (bovine). Methods EnzymoL 17, 653-664. 

(86) Sinn, H. J., Schrank, H. H., Friedfick, E. A., Via, D. P., 
and Dresel, H. A. (1988) Radioiodination of proteins and 
lipoproteins using N-bromosuccinimide as oxidizing agent. 
AnaL Biochem. 170, 186-192. 

(87) Sokolovsky, M,. Riordan, J. F.. and Vallee, B. L. (1966) 
Tetranitrome thane. A reagent for the nitration of tyrosyl 
residues in proteins. Biochemistry 5, 3582-3589, ' 

(88) Brake, J. M. and Wold, F. (1962) Carboxymethylation of 
yeast enolase. Biochemistry 1, 386-391. 

(89) Crestfield, A. M., Moore, S.. and Stein, W. H. (1963) The 
preparation and enzymatic hydrolysis of reduced and S-car- 
boxymethylated proteins, J. BioL Chem. 238, 622-627, 

(90) Markham, G. D. and Satishchandran, C. (1988) Identifi- 
cation of the reactive sulfhydryl groups of 5-adenosylmethio- 
nine synthetase. J, BioL Chem. 263, 8666-8670. 

(91) Lewis, C. T.. Seyer, J. M., and Carlson, G. M. (1989) Cys- 
teine 288: An essential hyperreactive thiol of cytosolictphoa- 
phoenol pyruvate car boxy kinase (GTP). J. BioL Chem. 264, 
27-33. 

(92) Fujioka, M., Takata, Y., Konishi, K., and Ogawa, H. (1987) 
Function and reactivity of sulphydryl groups of rat liver gly- 
cine methyltransferse. Biochemistry 26, 5696-6702. 

(93) Steuffer, C. E. and Etson, D. (1969) The effect on subtili- 
sin activity of oxidizing a methionine residue. J. BioL 
Chem. 244, 5333-6338. 

(94) Fujioka, M., Takata, Y., Konishi, K., and Ogawa, H. (1987) 
Function and reactivity of sulfhydryl groups of rat liver gly- 
cine methyltransferase. Biochemistry 26, 6696-5702. 

(95) Degani, Y., Neumann, H., and Patchornik, A. (1970) Selec- 
tive cyanylation of sulfhydryl groups, J. Am. Chem. Soc. 92, 
6969-6976. 

(96) Dagani, Y. and Patchornik, A. (1974) Cyanylation of sulf- 
hydryl groups by 2-nitro-5-thiocyanobenzoic acid. High yield 
modification and cleavage of peptides at cysteine residues. 
Biochemistry 13, 1-11. 

(97) Kindman, L. A. and Jencks, W. P. (1981) Modification and 
inactivation of CoA transferase by 5-nitro-5-(thiocyan- 
ato)benzoate. Biochemistry 20, 5183-5187. 

(98) Smith, D. J. and Kenyon, G. L. (1974) Nonessentiality of 
the active sulfhydryl group of rabbit muscle creating kinase. 
J. BioL Chem. 249, 3317-3318. 

(99) Hettinger, T. P. and Harbury, H. A. (1965) Guanidinated 
cytochrome c. Biochemistry 4, 2685-2589. 

(100) Shields, G. S., Hill, R. L., and Smith, E. L. Preparation 



2. MEANS AND FEENEY Chemt'ca^^Mfffcat/ons of Proteins 



19 



and properties of guanidinated me rcuri papain. J. BioL 
Chem. 234, 1747-1760, 

(101) Kaplan, H., Stevenson, K. J,, and Hartley. B. S. (1971) 
Competitive labelling, a method for determining the reactiv- 
ity of individual groups in proteins. Biochem. J. 124, 289- 
299. 

(102) Duggleby. K. G. and Kaplan, H. (1975) A competitive 
labeling, method for the determination of the chemical prop- 
erties of solitaiy functional groups in proteins. Biochemis- 
try 14, 5168-5175! 

(103) Shewale, J. G. and Brew, K. (1982) Effects of Fe'* bind- 
ing on the microenvironments of individual amino groups in 
human serum transferrin as determined by differential kinetic 
labeling. J. BioL Chem, 257, 9406-9415. 

(104) Ri^er, R. and Bosshard, H. R. (1980) Comparison of the 
binding sites on cytochrome c for cytochrome c oxidase, cyto- 
chrome bci arid cytochrome c^. Differential acetylation of lysyl 
residues in free and complexed cytochrome c «/. BioL 
Chem. 255, 4732-4739. 

(105) Jackson, G. E. D. and Young, N. M. (1986) Determina- 
tion of chemical properties of individual histidine and tyrosine 
residues of concanavalin A by competitive labelling with 1- 
fluoro-2,4-dihitrobenzene. Biochemistry^ 25, 1657-1662. 

(106) Buechler, J. A., Vedvick, T. A., and Taylor, S. S. (1989) 
Differential labelling of the catalytic subunit of cAMP depen- 
dent protein kinase with acetic anhydride: substrate- 
induced conformational changes. Biochemistry 28, 3018— 
3024. 

(107) Ray, W. J. and Koshland, D. E. (1961) A method for char- 
acterizing the type and numbers of groups involved in enzyme 
action. J, BioL Chem. 23B, 1973-1979. 

(108) Redkar, V. D. and Kenkare. U, W.. (1975) Effects of Uganda 
on the reactivity of essential sulfhydryls in brain hexoki- 
nase. Possible interaction between, substrate binding sites. 
Biochemistry 14, 4704-4712. 

(109) Horiike, K-.Tsuge, H.. and McCorraick, D. B. (1979) Evi- 
dence for an essential histidyl residue, at the active site of 
pyridoxamine(pyridoxine)-5'-phosphate oxidase from rabbit 
liver. J. BioL Chem. 264, 6638-6643. 

(110) Jimenez, J. S., K\ipfer, A., Gani, V., and Shaltiel, S. (1982) 
Conformational changes in the catalytic subunit of adenos- 
ine cyclic 3',5'-phosphate dependent protein kinase. Use for 
establishing a connection between one sulfhydryl group and 
the 7-subsite in the ATP site of this subunit. Biochemistry 
21, 1623-1630. 

(111) Ogawa, H., Okamoto, M., and Fujioka, M. (1979) Chem- 
ical modification of the active site sulfhydryl group of sac- 
charopine dehydrogenase (l- lysine- forming). J. BioL Chem. 
254,7030-7035. 

(112) First, E. H. and Taylor, J. J. (1989) Selective modifica- 
tion of the catalytic subunit of cAMP-dependent protein kinase 
with sulfhydryl-specific fluorescent probes. Biochemistry 

^ 28, 3598-3605. 

(113) Makinen, A. L. and Nowak, T. (1989) A reactive cysteine 
in avian liver phosphoenolpyruvate carboxykinase. J. BioL 
Chem. 264, 12148-12157. 

(114) Tsou, Chen-Lu (1962) Kinetic determination of essential 
side chains in proteins. Sci. Sin. 11, 1635-1558. 

(115) Kremer, A. B., Egan, R. M., and Sable, H. Z. (1980) The 
active site of transketolase two arginine residues are essen- 
tial for activity, J. BioL Chem. 255, 2405-2410. 

(116) Lakowicz, J. R. (1983) Principles of Fluorescence Spec- 
troscopy Plenum Press, New York. 

(117) Stryer, L. and Haugland, R. P. (1967) Energy transfer: A 
spectroscopic ruler. Proc. Natl. Acad. Sci. 58, 719-726. 

(118) Stryer, L. (1978) Fluorescence energy transfer as a spec- 
troscopic ruler. Ann. Rev. Biochem. 47, 819-846. 

(119) Haugland, R. P. (1989) Molecular Probes Handbook of 
Fluorescent Probes and Research Chemicals Molecular Probes, 
Inc.; Eugene, GR. 

(120) Berliner, L. J. (1976) Spin Labels Academic Press, New 
York. 

(121) Wold, F. (1972) Bifunctional reagents. Methods Enzy- 
moL 25, 623-651. 



(122) Wang, K. and Richards, F, M. (1974) An approach to near- 
est neighbor analysis of membrane proteins. J, BioL Chem. 
249, 8005-8018. 

(123) Uy, R. and Wold, F. (1977) Introduction of artificial 
crosslinks into proteins. Adv. Exp. Med, BioL 86 A, 169- 
186. 

(124) Das, M. and Fox. C. F. (1979) Chemical cross-linking in 
biology. Annu. Rev. Biophys. Bioeng: 8, 165-193. 

(125) Ji, T. H. (1983) Bifunctional reagents. Methods Enzy- 
moL 91, 580-609. 

(126) Kennedy, J. F. and Cabral, J. M. S. (1983) In Solid Phase 
Biochemistry (W. H. Scouten, Ed.) pp 253-392, John Wiley, 
New York. ' - 

(127) Laskin, Al I. (1985) Enzymes and Immobilized Cells in 
Biotechnology Benjamin/Cunmiings,- Inc., Menlo Park, CA. 

(128) Hartmeir, W. (1986) Immobilized Biocatalysts Springer- 
Verlag, New York. 

(129) Mosbach, K. (1987) Immobilized enzymes and cells, part 

B. Methods EnzymoL 135. 

(130) Mosbach, K. (1987) Immobilized enzymes and cells, part 

C. Methods EnzymoL 136. 

(131) Weare, J, A. and Reichert, I. E. (1979) Studies with car- 
bodiimide-cross-linked derivatives of bovine lutropin. I. The 
effects of specific group modificatipns on receptor site bind- 
ing in testes, J. BioL Chem, 254, 6964-6971. 

(132) Waldmeyer, B. and Bosshard. H. R. (1985) Structure of 
an electron transfer complex. I. Covalent cross-linking of 
cytochrome c peroxidase and cytochrome c. J. BioL Chem. 
260, 5184-5190. 

(133) Willing, A. H.. Georgiadis, M. M., Rees, D. C, and Howard, 
J. B. (1989) Cross-linking of nitrogenase components struc- 
ture and activity of the covalent complex. J. BioL Chem. 
264, 8499-8603. 

(134) korodi, I.. Asboth, B., and Polgar, L. (1986) Disulfide 
bond formation between the active-site thiol and one of the 

J several free thiol groups of chymopapain- Biochemistry 25, 
6895-6900. 

(136) Huston, E. E., Grammar, J. C, and Yount, R. G. (1988) 
Flexibility of the myosin heavy chain — Dirept evidence that 
the region containing SH^ and SH2 can move 10 A under the 
influence of nucleotide binding. Biochemistry 17, 8945- 
8952. 

(136) Hiratsuka, T. (1988) Cross-linking of three heavy chain 
domains of myosin adenosinetriphosphatase with a trifunc- 
tional alkylating reagent. Biochemistry 27, 4110^4114. 

(137) Peters, K. and Richards, F. M. (1977) Chemical cross- 
linking: Reagents and problems in studies of membrane struc- 
ture. Ann. Rev, Biochem. 46, 523-551. 

(138) Davies, G. E. and Stark, G. R. (1970) Use of dimethyl- 
suberimidate, a cross-linking reagent, in studying the sub- 
unit structure of oligomeric proteins. Proc. Na.tL Acad. Sci. 
U.S.A. 66, 651-656. 

(139) Dombradi, V., H^du, J., Bot, G., and Friedrich, P. (1980) 
Structural changes in glycogen phosphorylase as revealed by 
cross-linking with bifunctional diimidates:* phospho- 
dephospho hybrid and phosphorylase a. Biochemistry 19, 
2295-2299. 

(140) Pilch, P. E. and Czech, M. P. (1979) Interaction of cross- 
linking agents with the insulin effector system of isolated cells. 
J. BioL Chem. 254, 3375-3381. 

(141) Staros, J. V., Lee, W. T„ and Conrad, D, H. (1988) Mem- 
brane impermeant crosslinking reagents application to stud- 
ies of the cell surface receptor for IgE, Methods EnzymoL 
150, 503-512. 

(142) Heilman, H. D. and Holzner, M. (1981) The spatial orga- 
nization of the active sites of the bifunctional oligomeric enzyme 
tryptophan synthetase: Crosslinking by a novel method. Bio- 
chem, Biophys. Res. Commun. 99, 1146-1152. 

(143) Sato, S. and Nakao, M. (1981)* Cross-linking of intact eryth- 
rocyte membranes with a newly synthesized cleavable bifunc- 
tional reagent. J. Biochem. {Tokyo) 90, 1177-1181. 

(144) Moore, J. E. and Ward, W. H. (1956) Cross-linking of 
bovine plasma albumin and wool keratin. J. Am. Chem. 
Soc. 78, 2414-2418. • 



20 



PERSP^^BS IN BIOCONJUGATE CHEMISTRY 



(145) Hillel. Z. and Wu, C.-W. (1977) Subunit topography of 
RN A polymerase from Escherichia coli. A cross-linking study 
with bifunctional reagents. Bichemistry 16; 3334-3342. 

(146) Hingorani, V. N.» Tobias, D. T., Henderson, J. T.. and 
Ho. Y.-K- (1988) Chemical crosslinking of bovine retinal trans- 
ducin and cGMP phosphodiesterase. J. BioL Chem, 263, 6916- 
6926. 

(147) Kitagawa, T. and Aikawa, T. (1976) Enzyme coupled imnau- 
noassay of insulin using a novel coupling reagent J. Bio- 
chem. {Tokoyo) 79/233-236. 

(148) Youle, R. J. and Neville. D. M. (1980) Anti-thy 1.2 mon- 
oclonal antibody linked to ricin is a potent cell- type-specific 
toxin. Froc. Natl, Acad, Sci. U,S,A, 77, 6483-5486. 

(149) Yoshitake/S., Yamada, Y., Ishikawa, E.» and Masseyeff, 
R. (1979)- Conjugation of glucose oxidase from Aspergillus 
niger and rabbit antibodies using N-hydroxysuccinimide ester 
of 7V-(4-carboxycyclohexylmethyl)maleimide. Eur. J. Bio- 
chem, 101, 395-399. 

(160) Lambert, J. M., Senter, P. D., Young, A. Y. Y., Blattler, 
W. A., and Goldmacher, V. S. (1985) Purified immunotoxins 
that are reactive with human lymphoid cells. J. Biol. Chem. 
260, 12035-12041. 

(151) Carlsson, J., Dreyin, H., and Axen, R. (1978) Protein thi- 
olation and reversible protein-protein conjugation N- 
succinimidyl.3(2-pyridyldithio) propionate, a new heterobi- 
functional reagent. Biochem. J. 173, 723-737. 

(152) O'Keefe, D. O. and Draper, R. K. (1985) (Characteriza- 
tion of a transferrin-diphtheria conjugate. J, Biol* Chem. 
260, 932-937. 

(153) Jue, R., Lambert, J. M., Pierce, L. R., and Traut, R. R. 
(1978) Addition of sulfhydryl groups to Escherichia coli ribo- 
Bomes by protein modification with 2-iminothiolane (methyl 
4-mercaptobutyrimidate). Biochemistry 17, 5399-5406. 

(154) Marsh, J. W. (1988) Antibody-mediated routing of diph- 
theria toxin in murine cells results in a highly efficacious immu- 
notoxin. J. Biol. Chem. 263, 15993-15999. 

(155) Cover, J. R,, Lambert, J. M., Norman, C. M., and Traut, 
R. R, (1981) Identification of proteins at the subunit inter- 
face of the Escherichia coli ribosome by cross-linking with 
dimethyl 3,3'-dithiobi3(propionimidate). Biochemistry 20, 
2843-2852. 

(156) Staros, J. V., Wright, R. W„ and Swingle, D. M. (1986) 
Enhancement by N-hydroxysuIfosuccinimide of water-solu- 
ble carbodiimide modified coupling reactions. Anal. Bio- 
chem. 156, 220-222. 

(157) Koyama, Y. and Taniguchi, A. (1986) Studies on chitin 
X. Homogeneous cross-linking of chitosan for enhanced cupric 
ion adsorption. J. Appl. Polymer. Sci. 31, 1951-1954. 

(158) Colander. C.-G. and Eriksson, J. C. (1987) ESCA studies 
of the adsorption of polyethyleneimine and glutaraldehy de- 
reacted polyethyleneimine on polyethylene and mica sur- 
faces. J. Colloid Interface Sci, 119, 38-48. 

(159) Korn, A. H., Feairheller, S. H., and Filachione, E. M., 
Glutaraldehyde: Nature of the reagent. J. Mol. Biol. 65, 525- 
529. 

(160) Kirkeby, S., Jakobsen, P., and Moe, D. (1987) 
Glutaraldehyde— **pure and impure". A spectroscopic inves- 
tigation of two commercial glutaraldehyde solutions and their 
reaction products with amino acids. Anal. Lett. 20, 303- 
315. 

(161) Gregory, J. D. (1955) The stability of iV-ethylmaleimide 
and its reaction with sulfhydryl groups. J. Am. Chem. Soc. 

, 77,3922-3923. 

(162) Knight. P. (1979) Hydrolysis of p-iV.N'-phenylenebisma- 
leimide and its adducts with cysteine. Biochem. J, 179. 191- 
197. 

(163) Staros, J. V. (1982) TV-Hydroxysulfosuccinimide active 
esters: Bis (//- hydroxys ulfosuccimide) esters of two dicarbox- 
ylic acids are hydrophilic, membrane-impermeant, protein 
cross-linkers. Biochemistry 21, 3940-3955. 

(164) Yoshitake, S., Imagawa. M., Ishikawa, E.. Niitsu, Y., Urush- 



izaki, I., Nishiura, M., Kanazawa, R., Kurosaki, H., Tachibana, 
S., Nakazawa, N., and Ogawa, H. (1982) Mild and efficient 
conjugation of rabbit Fab' and horseradish peroxidase using 
a maleimide compound and its use for enzyme immunoas- 
say. J. Biochem. 92, 1413-1424. 

(165) Galardy, R. E., Craig, L. C, Jamieson, J. D., and Printz, 
M. P. (1974) Photoaffinity labeling of peptide hormone bind- 
ing «ites. J. Biol. Chem, 249, 3510-3518. 

(166) Wood, C. L. and O'Dorisio, M. S. (1986) Covalent cross- 
linking of vasoactive intestinal polypeptide to its receptors 
on intact human lymphoblasts. «/. Biol. Chem. 260, 1243- 
1247. 

(167) Schmitt, M., Painter, R. G., Jesaitis, A. J., Preissner, K., 
Sklar, L. A., and Cochrane, C. G. (1983) Photoaffinity label- 
ing of the iV-formyl peptide receptor binding site of intact 
human polymorphonuclear leukocytes. J. Biol. Chem, 258, 
649-664. 

(168) Benesch, R. and Benesch. R B. (1956) Formation of pep- 
tide bonds by aminolysis of homocysteine thiolactones. J, 
Am. Chem. Soc. 78, 1597-1599. 

(169) Klotz, I. M. and Heiney, K. E. (1962) Introduction of sulf- 
hydryl groups into proteins using acetylmercaptosuccinic anhy- 
dride. Arch. Biochem. Biophys. 96, 606-612. 

(170) Julian, R., Duncan, S., Weston, P. D., and Wriggles- 
worth, R. (1983) A new reagent which may be used to intro- 
duce sulfhydryl groups into proteins, and its use in the prep- , 
aration of conjugates for immunoassay. Anal, Biochem. 132, 
68-73. 

(171) Gitman, A. G., Kahane, L, and Loyter, A., (1985) Use of 
virus-attached antibodies or insulin molecules to mediate fusion 
between sendai virus envelopes and neuraminidase- treatefl 
cells. Biochemistry 24, 2762-2768. 

(172) Gordon, R. D., Fieles, W. E., Schotland, D, L., Hogue- 
Angeletti, R., and Barchi, R. L. (1987) Topographical local- 
ization of the C-terminal regions of the voltage-dependent 
sodium channel from Electrophorus electricus using antibod- 
ies raised against a synthetic peptide. Proc. Natl, Acad. Sci. 
U.S.A. 84, 308-312. 

(173) Senter, P. D., Saulnier, M. G., Schreiber, G. J., Hirsch- 
berg, D. L., Brown, J. P., Hellstrom, I., and Hellstrom, K, E. 
(1988) Anti- tumor effects ot antibody-alkaline phosphatase 
conjugates in combination with etoposide phosphate. Proc. 
Natl. Acad. Sci. U.S.A. 85, 4842-4846. 

(174) Hirs, . C, H- W. (1967) Protein structure. Methods 
Enzymol. II. 

(175) Hira, C. H. W. and Timasheff, S. N. (1983) Enzyme struc- 
ture, part I. Methods Emymol. 91. 

(176) Baker, B. R. (1967) Design of Active-Site-Directed Irre- 
versible Enzyme Inhibitors Wiley-Interacience, New York. 

(177) Means, G. E. and Feeney, R. E. (1971) Chemical Modi- 
fication of Proteins Holden-Day, San Francisco, CA. 

(178) Glazer, A. N., Delange, R. J., and Sigman, D. S. (1975) 
Chemical. Modification of Proteins. Laboratory Tech- 
niques in Biochemistry and Molecular Biology (T. S. Work 
and E. Work, Eds.) American Elsevier Publishing Co.. New 
York. 

(179) Lundblad, R. L. and Noyes, C. M. (1984) Chemical 
Reagents for Protein Modification Vols. 1 and 2, CRC Press, 
Boca Raton, FL. 

(180) Widder, K. J. and Green, R. (1985) Drug and enzyme 
targeting, Part A, Methods Enzymol, 112. 

(181) Pfleiderer, G. (1985) Chemical Modifications of Pro- 
teins. In Modern Methods in Protein Chemistry (H. 
Tschesche, E^.) Walter de Gryter, Berlin and New York. 

(182) Feeney, R. E. (1987) Chemical modification of proteins: 
Comments and perspectives. Int. J. Pept. Protein Res, 27, 
146-161. 

(183) Eyzaguirro, J. (1987) Chemical Modification of 
Enzymes: Active Site Studies John Wiley and Sons, New 
York. 





Exhibit 24 



The Journal of Biological CHEinsntY 

© 2000 by The American Society for Biochemistry and Molecular Biology, Inc. 



Vol. 275, No. 10, Issue of Mareh 10, pp. 7239-724B, 2000 

Printed in U.SJK. 



Re-engineering of Human Urokinase Provides a System for 
Structure-based Drug Design at High Resolution and Reveals 
a Novel Structural Subsite* 

(Received for publication, September 17, 1999, and in revised form, November 30, 1999) 
Vicki Nienaber|§, Jieyi WangU, Don DavidsonD, and Jack HenkinD 

From the Departments of ^Structural Biology and Kancer Research, Abbott Laboratories, Abbott Park, Illinois 60064 



Inhibition of urokinase has been shown to slow tumor 
growth and metastasis. To utilize structure-based drug 
design, human urokinase was re-engineered to provide 
a more optimal crystal form. The redesigned protein 
consists of residues ne^^-Lys^*^ (in the chymotrypsin 
numbering system; for the urokinase numbering system 
it is Ee^^^-Lys'**''*) and two point mutations, C122A and 
N145Q (C279A and N302Q). The protein yields crystals 
that diffract to ultra-high resolution at a synchrotron 
source. The native structure has been refined to 1.5 A 
resolution. This new crystal form contains an accessible 
active site that facilitates compound soaking, which was 
used to determine the co-crystal structures of urokinase 
in complex with the small molecule inhibitors amiloride, 
4-iodo-benzo(b)thiophene-2-carboxamidine and phenyl- 
guanidine at 2.0-2.2 A resolution. All three inhibitors 
bind at the primary binding pocket of urokinase. The 
structures of amiloride and 4-iodo-benzo(b)thiophene-2- 
carboxamidine also reveal that each of their halogen 
atoms are bound at a novel structural subsite adjacent 
to the primary binding pocket. This site consists of res- 
idues Gly^^^. Ser'^^, and Cys'^^-Cys^^o and the side chain 
of Lys*'*^. This pocket could be utilized in future drug 
design efforts. Crystal structures of these three inhibi- 
tors in complex with urokinase reveal strategies for the 
design of more potent nonpeptidic urokinase inhibitors. 



Cancer cell invasion, the spread and grov^th of tumor metas- 
tases, is a primary cause of mortality and morbidity of malig- 
nancy (2), and this invasion requires the degradation of base- 
ment membranes and other extracellular protein structures. 
Urokinase has been shown to be strongly associated with tu- 
mor cells (3) and to play a role in basement membrane degra- 
dation via a cascade mechanism involving activation of plas- 
minogen and the metalloproteases (4-6). Furthermore, 
inhibitors of urokinase have been reported to slow tumor me- 
tastasis as well as growth of the primary tumor (7-15). These 
inhibitors include the small molecules 4-iodo benzo(b)thio- 
phene-2-carboxamidine (B428),^ 4-benzQdioxolanyletheyl ben- 
zo(b)thiophene-2-carboxamidine (B623) (12-14), and amiloride 
(8, 15). These compounds are competitive inhibitors of uroki- 



* The costs of publication of this article were defrayed in part by the 
payment of page charges. This article must therefore be hereby marked 
"advertisement* in accordance with 18 U.S.C. Section 1734 solely to 
indicate this fact. 

§ To whom correspondence should be addressed: Dept. of Structural 
Biology, Abbott Laboratories, D46Y/AP10-LL, 100 Abbott Park Rd., 
Abbott Park, IL 60064-6098. Tel.: 847-935-0918; Fax: 847-937-2625; 
E-mail: vicki.nienaber@abbott.com. 

* The abbreviations used are: B428, 4-iodo-benzo(b)thiophene-2-car- 
boxamidine; B623, 4-benzodioxolanyIetheyl benzo(b)thiophene-2-car- 
boxamidine; LMW, low molecular weight; S2444, H-u-pyrogluLamyl- 
G ly- L- Arg-p -n i tro ani I i d e . 



nase and have been proposed to bind at the primary binding 
pocket common to all trypsin-like serine proteases (15). How- 
ever, none of these compounds posses all of the characteristics 
of a good therapeutic agent for the treatment of cancer. 

Structure-based drug design has become an important tool 
for improving the potency and pharmacological characteristics 
of compounds toward providing therapeutic agents. This 
method has contributed to the development of potent and spe- 
cific inhibitors for many targets such as HIV protease, cy- 
clooxygenase-2, influenza neuraminidase, and the metallopro- 
teinases (16-22). To most efficiently apply crystallography- 
driven structure-based drug design, it is preferable that the 
crystals have certain properties. One property is that active 
site of the target is open in the crystal lattice. This molecular 
packing permits the diffusion and binding of compounds into 
the active site and eliminates the need to optimize crystal 
grov^rth in the presence of each inhibitor. Another important 
property is that the crystals reproducibly diffiract to high res- 
olution (2.5-2.0 A). It is preferable that this data quality is 
achievable on a conventional rotating anode source, thereby 
eliminating the need for travel to synchrotron facilities. The 
higher resolution data facilitate unambiguous map interpreta- 
tion and minimize the average atomic positional error (23). 
Hence, an appropriate crystal form can greatly facilitate the 
process of structure-based drug design. A crystal system exists 
for urokinase, although it does not fully encompass the pre- 
ferred properties outlined above. 

Human low molecular weight (LMW) urokinase has been 
crystallized in complex with the peptidic inhibitor Glu-Gly-Arg- 
chloromethyl ketone (1). This structure reveals the geometry of 
the urokinase active site as well as the orientation of a peptide 
inhibitor in the substrate-binding groove. However, the LMW 
urokinase crystals diffiract to lower resolution (2.5 A resolution, 
synchrotron radiation; 3.0 A resolution, rotating anode source) 
and utilize co-crystallization to achieve the target-ligand com- 
plex. In addition, the active site is in close contact with another 
molecule because of a noncrystallographic 2-fold axis near the 
active site. This interaction could limit minor ligand induced 
conformational shifts and perhaps distort the active site con- 
formation. Furthermore, the noncrystallographic and crystal- 
lographic packing effectively blocks the active site such that it 
would be difficult to diffiise small molecules into the active site 
in this crystal form (if they were not blocked by the irreversible 
covalent inhibitor). Hence, although this system may be used 
for modeling of small molecule urokinase inhibitors, it may not 
provide an ideal system for structure-based drug design. There- 
fore, to design an anti-cancer therapeutic, a new crystal form of 
human urokinase was sought to facilitate the application of 
structure-based drug design. The strategy utilized protein en- 
gineering and information from the reported LMW urokinase 
structure to design an altered protein sequence to yield a new 
crystal form. 



This paper is available on line at http://www.ibc.org 



7239 



7240 



Crystal Structures of Urokinase at High Resolution 



The new form of urokinase, micro-urokinase, crystallizes 
under conditions very similar to the low molecular weight form 
(1), although crystal packing and data quality are very differ- 
ent. This new crystal form contains a monomer in the asym- 
metric unit and diffracts to ultra-high resolution ~ 103 
A). In addition, this crystal form has an open active site per- 
mitting direct diffusion of compounds into the apo-crystals and 
is therefore ideal for providing precise structure determina- 
tions for urokinase ligand complexes by the soaking technique. 

The re-engineered crystal system and soaking technique 
were utilized to determine the co-crystal structure of urokinase 
in complex with a series of small molecule inhibitors at 2.0 or 
2.2 A resolution. Two of these inhibitors, amiloride (24), and 
B428 (25, 26), have been shown to reduce tumor size and 
metastasis (8, 12-15), whereas the effect of the third, phenyl- 
guanidine (27) has not been reported to date. These complex 
structures were completed to determine the binding orienta- 
tion of each compound to urokinase. This information in turn 
may be utilized to design molecules of increased potency to- 
ward discovery of an anti-cancer therapeutic compound. 

EXPERIMENTAL PROCEDURES 

Recombinant Micro-urokinase — Micro-urokinase was engineered by 
polymerase chain reaction manipulations using a human urokinase 
cDNA as a template (28). The C279A and N302Q mutations were made 
by the method of polymerase chain reaction based site-directed mu- 
tagenesis. Urokinase native leader sequence was fused directly to Ile^^^ 
by polymerase chain reaction. This product was ligated to a baculovirus 
transfer vector pJVPlOz (29). The final expression vector sequence was 
confirmed by DNA sequencing. 

The pJVPlOz-micro-urokinase vector was transfected into Sf9 cells 
by the calcium phosphate precipitation method using the BaculoGold 
kit from PharMingen (San Diego, CA). Single recombinant virus ex- 
pressing micro-urokinase was plaque purified by standard methods, 
and a large stock of the virus was prepared. Large scale expression of 
micro-urokinase was performed in suspension in High-Five cells, (In- 
vitrogen, San Diego, CA) growing in Excel 405 serum free medium (JRH 
Biosciences, Lenexa, KS) at 27 ''C. Urokinase activity in the superna- 
tant was measured by amidolysis of the chromogenic urokinase sub- 
strate H-D-pyroglutamyl-Gly-L-Arg-p-nitroanilide (S2444; Helena Lab- 
oratories, Beaumont, TX). The culture supernatant was harvested as 
the starting material for purification. Protease inhibitors, iodoacet- 
amide (10 mM), benzamidine (5 mM), and EDTA (1 mM) were added to 
the pooled culture medium. The medium was diluted 5-fold with 5 mM 
HEPES. pH 7.5, and filtered through 1.2 and 0.2- jum membranes. The 
micro-urokinase protein was captured onto Sartorius membrane ad- 
sorber SlOO (Sartorius, Edgewood, NY) by passing the medium through 
the membrane at a flow rate of 50 —100 ml/min. After extensive wash- 
ing with 10 mM HEPES, pH 7.5, containing 10 mM iodoacetamide, 5 mM 
benzamidine, and 1 mM EDTA, micro-urokinase was eluted from SlOO 
membrane with a NaCl gradient (20-500 mM, 200 ml) in 10 mM HEPES 
buffer, pH 7.5, 10 mM iodoacetamide, 5 mM benzamidine, 1 mM EDTA. 
The eluate was diluted 10-fold with the above 10 mM HEPES buffer 
containing inhibitors, and loaded onto a S20 column (Bio-Rad). Micro- 
urokinase was eluted with a 20x column volume NaCl gradient (20- 
500 mM). No inhibitors were used in the elution buffers. The eluate was 
then diluted 5-fold with 10 niM HEPES buffer, pH 7.5, and loaded onto 
a heparin-agarose (Sigma) column. Micro-urokinase was eluted with a 
NaCl gradient from 10—250 mM. The heparin column eluate of micro- 
urokinase was applied to a benzamidine-agarose (Sigma) column equil- 
ibrated with 10 mM HEPES buffer, pH 7.5, 200 mM NaCl. The column 
was washed with the equilibration buffer, and the urokinase was eluted 
with 50 mM NaOAc, pH 4.5, 500 mM NaCl. The micro-urokinase eluate 
was concentrated to 4 ml by ultrafiltration and applied to a Sephadex 
G-75 column equiHbrated with 20 mM NaOAc, pH 4.5, 100 mM NaCl. 
The single peak containing micro-urokinase was collected and lyophi- 
lized as the final product. 

Amidolytic Kinetics of Urokinase and Micro-urokinase — The effects 
of synthetic inhibitors on the steady state amidolytic activity of LMW 
urokinase or micro-urokinase toward the chromogenic substrate, S2444 
(Helena Laboratories), was characterized by the formation of p-nitroa- 
naline (30). Briefly, 0-50 ^am concentration of inhibitors were tested 
against 25 lU/ml (0.14 ng/ml) LMW urokinase or micro-urokinase and 
0.4-4.0 mM concentrations of S2444 in 200 /xl volumes in phosphate- 



buffered saline and 0.01% bovine serum albumin, pH 7.4. Incubations 
were performed at 37 *C with absorbance at 405 nm recorded every lis 
for 20 min. Data were plotted as 1/S versus l/v for Lineweaver-Burk 
analysis and the calculation of inhibition constants. values were 
obtained from replots of the resultant slopes versus I (26, 31). 

Protein Crystallography — Crystals were obtained by the hanging 
drop vapor diffusion method. A typical well solution of 0.15 m Li2S04, 
20% polyethylene glycol MW 4000 in succinate buffer, pH 4.8-6.0, was 
used. On the coverslip, 2 /xl of well solution is mixed with 2 /xl of protein 
solution, and the slip is sealed over the well. Crystallization occurred at 
18—24 *C within 24 h. The protein solution was composed of 6 mg/ml 
(0.21 mM) micro-urokinase in 10 mM citrate, pH 4,0, 3 mM e-amino 
caproic acid />-carbethoxyphenyl ester chloride v^dth 1% Me^SO co- 
solvent. The resultant micro-urokinase crystals are composed of en- 
zyme vrith an empty active site. The compound €-amino caproic acid 
p-carbethoxyphenyl ester chloride is reported to inhibit urokinase with 
an apparent of 0.3 p.M at neutral pH and was co-crystallized with 
urokinase in an attempt to obtain a complex structure (32). Repeated 
tests with this compound resulted in a structure with an active site 
occupied only by ordered solvent molecules even at 1.5 A resolution. 
Hence, we have hypothesized that this inhibitor is degraded during the 
crystallization experiment albeit critical for obtaining urokinase crys- 
tals. Studies are underway to try to understand the mechanism of this 
phenomenon. 

The micro-urokinase crystals belong to the space group P2,2i2i with 
unit cell dimensions of a = 55.16 A, 6 = 53.00 A, c = 82.30 A and a = 
^ =^ y = 90" and diffract beyond 1.5 A on a Rigaku RTP 300 RC rotating 
anode source equipped with an RAXISII detector. In addition, a 1.03 A 
resolution native data set was collected on a CCD detector at beam line 
Fl of the Cornell High Energy Synchrotron Source in Ithaca, NY. All 
data were collected at 100-160 K and processed by the program pack- 
age DENZO (33). Before crystals were fVozen, they were passed through 
a solution of 0.15 m Li2S04, 20% polyethylene glycol MW 4000, succi- 
nate buffer, pH 4.8-6.0, and 20% glycerol for cryogenic protection. Data 
were collected at low temperature to preserve the diffraction of the 
crystal throughout data acquisition. The crystal structure was deter- 
mined by the molecular replacement method using the program 
AMORE (34). The LMW urokinase structure was used as the search 
probe (1) (Protein Data Bank enti-y ILMW) against the RAXISII data. 

The structure was refined to 1.5 A resolution using the synchrotron 
data and the program package XPLOR (35) by a combination of rigid 
body, simulated annealing maximum likelihood refinement, and max- 
imum likelihood positional refinement. Electron density maps to 1.5 A 
resolution were inspected on a Silicon Graphics INDIG02 workstation 
using the program package QUANTA 97 (Molecular Simulations, Inc). 
At 1.5 A resolution constrained individual temperature factor refine- 
ment was also included in the refinement cycle. Electron density maps 
to 1.5 A resolution were examined, and water molecules and bound ions 
were identified as positive peaks in the F„ — map at least 4 cr above 
noise. Refinement continued with automatic water addition using the 
XWAT feature of SHELXL (36). Final refinement steps included cycles 
of model building where disorder and additional solvent molecules were 
added. The final R-factor is 19.2% with a R^^^^ of 21.8%. 

To obtain the amiloride, B428, or phenylguanidine micro-urokinase 
complex structures, crystals of urokinase were placed in 50 /llI of crys- 
tallization mother liquor to which 0.5 jLtl of a 1 mg/10 /xl compound 
solution was added. The solid compound was obtained from the Abbott 
chemical repository and was initially dissolved in McaSO. Crystals were 
allowed to incubate for 12-15 h at 24 "C and prepared for data collection 
in a manner identical to that of the native crystals. Data were collected 
on a Rigaku RTP 300 RC rotating anode source equipped with an 
RAXISII detector at 160 K by the method of flash freezing. Data were 
processed using the HKL program suite (33). Initial electron density 
maps were calculated using the program package XPLOR (35) and the 
1.5 A native model. All electron density maps were inspected on a 
Silicon Graphics INDIG02 workstation using QUANTA 97, and the 
orientation of all compounds were clearly visualized in the initial 2F^ — 
map. The complexes were refined to 2.0 A resolution using the 
program package XPLOR. Refinement consisted of alternating steps of 
positional and B-factor refinement. Ordered solvent molecules were 
identified as positive peaks in the F^ — F^ map that were 4 cr above 
noise. 

Table I summarizes statistics for all micro-urokinase models. All 
data are between 89 and 90% complete with a merging R^y^ between 7 
and 11% and an J/<t between 12 and 15. The native model is refined to 
a i?(v.ctor of 19.2% and Rf^^^ of 21.8% at 1.5 A resolution. The overall 
B-factor for the protein is 12 A^, and the overall B-factor for the 337 
ordered solvent molecules is 26 A^. The current native model also 



Crystal Structures of Urokinase at High Resolution 



7241 



Table I 
Data quality statistics 





Complete 


liiT 


^eyin isquarcr 


n b 
^ facuir 




j.^ d L 1 V 


% 














1 ^ 




1Q 1 


21 8 


1.53-1.50A 


95.3 


9 


0,113 


21.2 (1.57-1.50) 


25.8 (1.57-1.50) 


B428 












Overall 


89.9 


16.8 


0.083 


20.9 


27.7 


2.O5-2.0A 


88.4 


5 


0.203 


20.0 


29.4 


Amiloride 












Overall 


99.8 


12.4 


0.108 


21.5 


29.1 


2.3-2.2 


99.8 


4.3 


0.358 


19.1 


26.9 


Phenyl guanidine 












Overall 


90.3 


13.5 


0.086 


18.9 


22,1 


2.06-2.00 


94.2 


4.5 


0.254 


24.3 


24.8 



« R,^^ = V ((/ - </>) ** 2)/l (I ** 2) 

^ •'factor i» c"**' o 

^ Value of the Rfoinor where 10% of the data were randomly removed from the refinement. 



contains three ordered sulfate ions, and two alternate side chain con- 
formations located at the active site. All backbone atoms are well 
defined in the final 2F^ — F^ map with atomic B-factors at or below 30 
A^. The B428 model is refined to 2.0 A resolution with a Rf^^^^ of 20.9% 
and a Rfr^^ of 27.7%, while the amiloride model is refined to 2.2 A 
resolution with a -Rector 21.5% and a Rf^o^ of 29.1%. The phenylgua- 
nidine model is refined to 2.0 A resolution with a /^factor of 18.9% and a 
^freo of 22.1%. Data for the complex structures were of quality compa- 
rable with that of native structures collected under the same conditions 
on a rotating anode source. 

RESULTS 

Redesign of LMW Urokinase — To redesign the LMW uroki- 
nase sequence for the purpose of improving the crystal charac- 
teristics, the LMW urokinase coordinate file (Protein Data 
Bank entry ILMW) was examined for sequences of excessively 
high B-factor, suggesting areas of disorder. The hypothesis is 
that areas of high disorder in the structure may contribute to 
the overall disorder of the crystals and/or may interfere with 
optimal crystal packing. The LMW urokinase structure con- 
sists of residues 136-158 of the A-chain and 159-411 of the 
B-chain connected by a disulfide bridge between Cys^"*** and 
^yg279 (urokinase numbering).^ The B-chain corresponds to the 
serine protease domain, whereas the 21 residue A-chain lacks 
the kringle and epidermal growth factor domains present in 
full-length urokinase. The A-chain is reported to be an area of 
high disorder (1), and examination of the protein data bank 
coordinate file (Protein Data Bank entry ILMW) reveals that 
residues 148-155 of the A-chain have an average B-factor of 64 
ranging from 26 for the disulfide-linked sulfur of residue 
Cys^'**' to 110 for Pro^^"^. The very high B-factors for the 
LMW urokinase A-chain confirm this observation. Conse- 
quently, the A-chain was removed as a first step in the rede- 
sign. Furthermore, to remove the resultant free thiol on the 
B-chain, Cys^"*® was mutated to an alanine. 

Further examination of the LMW urokinase coordinate file 
indicates a second area of disorder consisting of residues 405— 
411 of the C terminus where the average B-factor is 147 A^. 
Residues 407-411 represent a five residue extension in uroki- 
nase relative to other trypsin-like serine proteases. However, 
because residues 405-406 also have high atomic B-factors, the 
entire 405-411 segment was removed. The final potential site 
for disorder is the glycosylation site at residue 302. This glyco- 
sylation site was removed by an N302Q mutation to facilitate 
expression of the glycosylation-free protein in baculovirus. 
Hence, the re-engineered urokinase (micro-urokinase) consists 



^ The urokinase numbering system is used for discussion of the se- 
quence re-engineering work, whereas the chymotrypsin numbering sys- 
tem as aligned by Ref. 1 is used for discussion of the serine protease 
domain structure for micro-urokinase. 



of residues Ile^^^— Lys'^*^'* (Ile^^-Lys^*^ chymotrypsin numbering 
system) with the two point mutations C279A (C122A) and 
N302Q (N145Q). 

Micro-urokinase Crystal Packing — Micro-urokinase crystal- 
lizes with a monomer in the asymmetric unit (P2i2i2i), 
whereas the LMW urokinase crystal form has a dimer in the 
asymmetric unit (R3) with intimate contacts at the substrate- 
binding site. Specifically, in LMW urokinase, residues 94-101 
from each molecule (chymotrj^psin numbering system as 
aligned by Ref 1)^ form a series of intermolecular main chain 
hydrogen bonds resulting in an extended four stranded /3-sheet 
(1). From the LMW urokinase structure, it was seen that this 
loop decreases the size of the S4 pocket relative to that at the 
substrate-binding site of other serine proteases such as throm- 
bin, Factor Xa and tissue plasminogen activator (1, 37-39). 
Hence, this loop provides a critical structural feature of the 
substrate-binding groove. However, because of the close crystal 
contact at this site in the LMW urokinase crystals, the possi- 
bility existed that the structure of the substrate-binding site 
may be distorted or conformationally restricted. The new crys- 
tal form of micro-urokinase lacks the close crystal contact pres- 
ent in LMW urokinase, and an overlay of the two structures 
indicates that the conformation of this loop is essentially iden- 
tical in the two crystal forms. Consequently, it is unlikely that 
packing in either crystal system affects the conformation of this 
loop and the resultant shape of the S4 pocket, although the 
more open micro-urokinase packing may allow for inhibitor- 
induced conformational shifts. 

Examination of crystal packing at the A-chain-binding cleft 
gives insight into why micro-urokinase yields different lattice 
packing and better diffracting crystals (a sample of the final 
2F^ - electron density map at 1.5 A resolution is shown in 
Fig, LA). In LMW urokinase, the A-chain binds in a cleft com- 
posed of residues 25-29, 116-122, and 201-208. In the crystal 
structure of micro-urokinase, there is no A-chain, and the A- 
chain-binding cleft is partially occupied by a symmetry related 
molecule. Specifically, a hydrophobic loop extending from 144 
to 150 in the symmetry related molecule is directly bound at 
the A-chain site such that Tyr^'^^-OH of the loop is involved in 
two hydrogen bonds at the A-chain cleft (Ser^°^-N and Ser^^^- 
O). In LMW urokinase, the A-chain blocks this set of interac- 
tions. Thus, in micro-urokinase, removal of the A-chain exposes 
a new "binding site" for the 144—150 loop of another micro- 
urokinase molecule permitting a new lattice to form. This in- 
teraction at the A-chain cleft probably contributes to the im- 
proved crystal quality by being both a site of nucleation as well 
as by facilitating very close contact between adjacent 
molecules. 



7242 



Crystal Structures of Urokinase at High Resolution 




B 



H99 " 





C58 



042 



Fig. 1. A, final 2F^ - electron density map contoured at 1 <t for 
native micro-urokinase at 1.5 A resolution. Residues 146—148 are de- 
picted in thick lines. B, 2F„ - (purple) and F,, - F^ {green) at His^'-*. 
The 2F^ - F^ map is contoured at 1 a, and the F^- F^\s contoured at 
3 <T. The map is for refinement of the side chain in one conformation. C, 
2F^ - (purple) and F^ - F, (green) at Cys*'^. The 2F„ - F^ map is 
contoured at 1 cr, and the F„ - F^, is contoured at 3 a. The map is for 
refinement of the side chain in one conformation. 

Micro-urokinase and LMW urokinase are nearly identical in 
structure (overall rms deviation for main chain atoms, 0.8 A) 
with one significant structural change near a site of re-engi- 
neering. As discussed above, removal of the A-chain results in 
an empty cavity. One loop (201-210) forming this site under- 
goes a conformational shift relative to LMW urokinase with 
rms deviation (main chain) ranging from 1.1 to 1.8 A vidth the 
largest shift being for Arg^^^. However, although this loop is 
involved in a crystal packing interaction, the conformation of 
the 144-150 of the symmetry related molecule is the same for 
both micro-urokinase and LMW urokinase. Other sites of var- 
iation include the flexible loop at residues 37-37D (rms devia- 
tion main chain, 1.7-3.5 A), residues 17-19 (rms deviation 
main chain, 1.1-2.1 A) and residues 185B-186 (rms deviation 
main chain, 1.7 A). All areas were of high b-factor in the LMW 
urokinase structure (b-factor > 60-90 A'"^) but of significantly 
lower b-factor in the micro-urokinase structure (b-factor < 20 
A^) with the exception of residues 17-19, which were of low 
b-factors in both structures. The 17-19 segment was clearly 
defined in the final 2F^ ~ electron density maps of micro- 
urokinase and is not near any re-engineered sites. Residues 
185B-186 were remodeled in the higher-resolution structure. 
In the lower resolution LMW urokinase structure, Trp^**^ was 
exposed to solvent and Gln^®^** was buried. The higher resolu- 
tion data clearly placed Trp****^ in the protein core with Gln*^*'*'^ 
exposed to solvent. 

Active Site of Native Micro-urokinase — Like the overall mo- 
lecular fold, the active sites of LMW urokinase and micro- 
urokinase are nearly identical (rms deviation, <0.8 A). The 
higher resolution data did not depict any large side chain 
movements relative to LMW urokinase but did show an alter- 
nate side chain conformation for two residues (Fig. 1, B and C) 
in addition to a bound sulfate ion (see Fig. 3C). The sulfate ion 
is bound near the oxyanion hole (40), where Ol is accepting 
hydrogen bonds from Gly^'^^-NH (2.8 A) and Ser'^^-OH (2.8 A), 
whereas O2 is accepting a hydrogen bond from His^^-N€2 (2.8 
A). Hence, the higher resolution data revealed more structural 
details at the active site. 

In Fig. IB, native 1.5 A 2F^ - (contoured at 1 a) and F^ - 
F^ (contoured at 3 cr) electron density maps depict that the side 
chain of His^^ is in multiple conformations. These maps were 
calculated before the alternate conformation had been included 



Table 11 

Inhibition constants determined for LMW urokinase and 

micro'U rokinase 

Ring numbering is shown in conjunction with the chemical structure 
for each inhibitor. 



LMW- 
urokinase 



Kj(nM) 



micro- 
urokinase 




0.490 + 0.018 



0.512 + 0.022 



7.2 + 0.2 



6.9 + 0.4 



H5N 



Phcnylguanidine 20.6+^1.0 



17.4 + 1.1 



in the model. As presented in Fig. IB, one His®® conformation 
is identical to that observed with LMW urokinase. In this 
conformation, His®^-N61 accepts a hydrogen bond from 
Tyr®'*-OH (2.9 A). In the alternate conformation (modeled into 
the green positive peak; Fig. LB), the His^^ imidazole is rotated 
approximately 90** about the Cj3-Cy bond resulting in a differ- 
ent hydrogen bonding pattern. Here, His^'^-NSl can donate a 
hydrogen bond to Asp^^'^-OSl (3.2 A). The His^'^ side chain 
forms part of both the S4 and pockets. Hence, a change in the 
conformation of His^® results in a change in the overall shape of 
S2 and S4, suggesting that the side chain movement would 
effect a drug design strategy directed toward the substrate- 
binding groove. 

The side chain of Cys'*^ is also observed in two side chain 
conformations and is near the active site (Fig, IC). In what is 
likely the major conformation, the Cys'^^-Cys^® disulfide bridge 
is intact. However, in the alternate conformation, the disulfide 
is broken and the Cys"*^ thiol group Ues in a small hydrophobic 
pocket formed by the side chains of Phe^®, Ile^^, and Val"*^. This 
side chain shift is unexpected as the Cys'^^-Cys^^ disulfide 
bridge is present all trypsin-like serine protease structures, 
and its proximity to the catalytic triad suggests that it may 
structurally stabilize the active site. Hence, one might expect 
the catalytic activity to be affected when this disulfide bridge is 
broken. On the other hand, one must note that this observation 
occurs in the solid state and that further solution work would 
be necessary to determine its physiological significance. 



Crystal Structures of Urokinase at High Resolution 



7243 




B 



H57 



-200 



— I ■ I ' r 

1000 1200 1400 



Substrate (M'^ ) 



Fig. 2. Lineweaver-Burke analyses of B428 inhibition of mi- 
cro-urokinase were performed in amidoljrtic chromogenic as- 
says with S2444 as described under **Experimental Procedures." 

S2444 substrate concentrations were 0.8, 1.0, 1.3. 2.0, and 4.0 mM. B428 
concentrations were 0 nM (▼), 250 nM (A), 500 nM (•), and 1000 nM (■). 
Data represent the means of triplicate determinations. Ki values were 
determined by replots of slope verses inhibitor concentration Unset) and 
are represented in Table II. 



Examination of crystal packing at the active site reveals that 
the micro-urokinase molecules pack forming a solvent channel 
that leads to the active site groove. Therefore, small molecule 
inhibitors may diffuse into the crystal and bind at the active 
site. This is important from a structure-based drug design 
perspective because it facilitates soaking as a method of form- 
ing protein-compound complex crystals. The soaking method 
was used to obtain crystal structures with the three known 
urokinase inhibitors, B428, amiloride, and phenylguanidine. 
These structures were obtained at high resolution and provide 
a starting point for structure-based drug design of a nonpep- 
tidic urokinase inhibitor. 

B428 — B428 has been reported to inhibit human urokinase 
with an IC50 value of 0.320 (Refs. 25 and 26 and Table II). 
B428 inhibition was tested versus LMW urokinase and micro- 
urokinase, and Fig. 2 presents the Lineweaver-Burke analysis 
for the effect of B428 on the activity of micro-urokinase. The 
results show that B428 competitively inhibits micro-urokinase 
as observed for the native enzyme (25, 26). As listed in Table II, 
B428 inhibits LMW urokinase vnth a iC,. of 0.490 /xm while 
inhibiting micro-urokinase with a Ki of 0.512 ^xm. Hence, 
values for the native and re-engineered forms of the protein are 
essentially identical and are consistent with reported IC50 

values (25. 26). 

The B428-micro-urokinase co-crystal structure was com- 
pleted to 2.0 A resolution. In the complex structure, the 2F^ - 

and - maps indicate that His®^ is in two conformations 
as observed in the native structure although Cys^^ is observed 
only in the conformation in which the Cys'*"^-Cys*'*® disulfide 
bridge is intact. It is unclear why only one conformation is 
observed for the Cys'^^-Cys^® disulfide. In the native structure, 
the alternate conformation became visible at high resolution. 
Hence, one possibility is that second conformation is not visible 
in the lower resolution electron density map. Another explana- 
tion is that inhibitor binding may induce a shift to a single 
conformation or that the inhibitor may only bind to the protein 
form where the disulfide is intact- Further experiments at high 
resolution will be necessary to fully understand this phenom- 
enon. Fig. 3A shows the 2F^ - F^ (contoured at 1 a) and F„ - 
F^ (contoured at 3 a) electron density maps calculated in the 





S1.90 



D189 




Fig. 3. a, initial 2F« - F,. (purple) and F^ - F^ {green) maps contoured 
at 1 and 3 tr, respectively, for the binding site of B428 before refinement. 
B, molecular surface as calculated by the program package QUANTA 
(Molecular Simulations Inc.) depicting interactions between B428 and 
micro-urokinase. The inhibitor and inhibitor surface are shown in or- 
ange^ whereas the protein and the protein surface are shown in cyan. C, 
view of B428 bound at the site of urokinase. The S.^ site between 
His^' and His^^ is also shown as well as the S4 site. An ordered sulfate 
ion is also shown bound near the oxy anion hole, 

absence of inhibitor and before any refinement cycles. All at- 
oms of the inhibitor are clearly defined in both maps, and the 
compound is found to bind at the Sj pocket as might be pre- 
dicted fi-om its net positive charge. 

Interactions between B428 and the pocket are consistent 
with observations for trypsin and other trypsin-like enzymes 
(41-45). Nearly all atoms of B428 are in van der Waals' or 
hydrogen bonding contact with the site (Fig. 3, B and C). The 
inhibitor does not occupy other pockets of the substrate-binding 
groove. The benzothiophene ring is in contact with the rim of 
the Si site that is composed of the Cys^^^-Cys^^° disulfide 
bridge and the main chain atoms of Ser^^^-Cys^^^ and Gln^^^- 
Cys^^^ In the pocket, the thiophene ring is also in contact with 
the side chains of VaV'^^^ Ser^^^ Asp^^\ and Ser^^^ The ami- 
dine is donating hydrogen bonds to Ser^^'^-Oy (3.0 A), Asp^^®- 
OSl (2.8 A), Asp^«^-OS2 (2.8 A), and Gly^^^-O (2.7 A) (Fig. SB). 
Hence, both hydrophobic and hydrophilic interactions occur at 

Si. 

In addition to interactions at S^, the 4-iodo group is pomting 
out of the Si pocket away from the substrate-binding groove 
and is making van der Waals' interactions with the side chain 
of Cys220 and the main chain atoms of Gly^^^. These residues 
form part of a subpocket composed of the disulfide bridge at 
Cys^®^-Cys22°, residues Gly^^® and Ser^*^, and the side chain of 
Lys^'*^. This pocket has been termed the Sjp pocket because of 
its proximity to the primary site (Fig. 3C). It is reported that 
the 4-iodo group of B428 confers a 10-fold increase in binding 
potency relative to the 4-hydro compound (25, 26). This obser- 
vation is consistent with the B428-urokinase crystal structure 
where the 4-iodo group partially accesses the Si/3 pocket. Fur- 



7244 



Crystal Structures of Urokinase at High Resolution 



B 




H57 




Q192 



r G21 8 



D189 



S195 



K143 




SI 95 



K143 



S190 



D189 




S146 S19 



SI 46 



D189 



Fig 4 A initial 2F - F {purple) and - (5ree^^) maps contoured at 1 and 3 ir, respectively, for the binding site of amilonde before 
refinement; molecular surface as calculated by the program package QUANTA (Molecular Simulations Inc.) depicting interactions between 
amiloride and micro-urokinase. The inhibitor and inhibitor surface are shown in peach, whereas the protein and protein surface are s^^o^" ^"^^"^^ 
C, overlay of the crystal structures of amiloride {purple) and B428 {orange) micro-urokinase showing that the halogen atoms of each inhibitor are 
occupying the same site. 



thermore. B623 inhibits urokinase with an IC50 of 0.07 ptM (25, 
26). Based upon the crystal structure of B428-micro-urokinase, 
it is possible that this larger 4-substituent is occupying more of 
the Si/3 pocket^ and consequently binds more tightly to uroki- 
nase. Hence, access to this novel pocket has been shown to 
confer an increase in binding potency and may serve as a site 
for further substitution in structure-based drug design. 

Examination of the crystal structure of B428-urokinase 
shows that the 5 and 6 positions of the benzo(b)thiophene-2- 
carboxamidine are also open for substitution, whereas the 3 
and 7 positions are buried within the pocket and therefore 
less likely to accommodate a substituent. Of these, the 5 posi- 
tion does not directly point toward any pockets of the urokinase 
molecule because it points toward Gln^^'^ and out toward bulk 
solvent. Hence, substitution at this position is less likely to 



^ The crystal structure of B623 in complex with urokinase could not 
be completed because of solubility issues with the compound. 



confer a large increase in binding potency. On the other hand, 
the 6 position points toward the urokinase catalytic site al- 
though the position appears partially blocked by the side chain 
of the active site Ser^^^. The distance from Ser^^^-OH to the 6 
position carbon is 3.2 A; therefore incorporation of a substitu- 
tion at this position may require a shifting of the benzothio- 
phene scaffold away from Ser'^*^. Additionally, substitutions at 
the 6 position would not orient toward the substrate-binding 
groove accessed by Glu-Gly-Arg-chloromethyl ketone. Substi- 
tutions at the 6 position would have to bend back toward the 
substrate-binding site or access other subsites. Nevertheless, 
the 4 and 6 positions appear to be the best substitution sites 
toward increasing the binding potency of B428, and both sets of 
substitutions will likely occupy sites apart from the substrate- 
binding groove. 

Amiloride — Amiloride has been reported to inhibit human 
urokinase with a (24) or IC50 of 7 mm (25, 26). As observed 
with B428, amiloride also competitively inhibits LMW uroki- 



Crystal Structures of Urokinase at High Resolution 



7245 



nase and micro-uro kinase with similar values {K^ = 7.2 /xm for 
LMW urokinase, and = 6.9 /xm for micro-urokinase). Amilo- 
ride is a weaker urokinase inhibitor than B428 (Table II) but 
may have more favorable pharmacological properties because 
the compound is an orally active commercial drug (46). To 
compare the binding modes of amiloride and B428 and to es- 
tablish strategies for development of a more potent amiloride- 
based urokinase inhibitor, the co-crystal structure of amiloride 
micro-urokinase was completed at 2.2 A resolution. 

Examination of the 2F^ - (contoured at 1 <r) and - F^ 
(contoured at 3 a) electron density maps at the active site 
shows that all atoms of the inhibitor are clearly defined in both 
maps (Fig. 4A), In addition, the maps show His®^ in two con- 
formations and the Cys*^-Cys^^ disulfide bridge intact as ob- 
served in the B428 complex. The data also indicate that amilo- 
ride binds at the Sj pocket as observed with B428 (Fig, 4C). 

The crystal structure of amiloride-micro-urokinase indicates 
that amiloride is making more hydrogen bonding interactions 
at the Si site than B428 while maintaining some of the van der 
Waals' interactions within the pocket. The size of the amiloride 
pyrazine scaffold is smaller than the B428 benzothiophene 
such that even though the pyrazine ring is in contact with the 
rim of the Si pocket as observed for B428, the extent of the 
packing interactions is smaller. In place of the thiophene ring, 
the 3-amino and 2-acylguanidine groups of amiloride are mak- 
ing hydrogen bonding interactions. Specifically, the 3-amino 
group is packed underneath the side chain of Ser^®^ as shown 
in Fig. 4B where it is donating a hydrogen bond to Ser^^^-Oy 
(3.1 A). The carbonyl of the acyl guanindine group is accepting 
a hydrogen bond (2.9 A) from a buried solvent molecule bound 
directly above Tyr^^®. The guanidine-NH is donating a hydro- 
gen bond to Gly^^^-O (3.1 A). As observed with B428, the 
amide-like nitrogens are donating hydrogen bonds to Gly^^^-O 
(2.7 A) and Asp^®^-O51(3.0 A) or to Asp^^®-052 (3.0 A), and 
Ser^^**-Oy (2.7 A). The hydrogen bonding geometry of the gua- 
nidinium group is also very similar to that observed for ArgP^ 
in the Glu-Gly-Arg-chloromethyl ketone-LMW urokinase struc- 
ture (1). Hence, although the core scaffolds of both B428 and 
amiloride are bound at the S, pocket, the nature of the inter- 
actions within the pocket are different. 

The crystal structure of amiloride-micro-urokinase reveals 
strategies for structure-based drug design of a more potent 
small molecule inhibitor. One potential site of substitution is 
the 6 position. The 6-chloro group of amiloride is accessing the 
SijS pocket as observed for the 4-iodo group of B428. Specifi- 
cally the 6-chloro group is in hydrophobic contact with the side 
chain of Cys^^** and the main chain atoms of Gly^^^ (Fig. 4C). 
Thus, although the chemical structures of B428 and amiloride 
are very different, interactions at the S^/S pocket are nearly 
identical. Because of this similarity, one might substitute the 
6-chloro position of amiloride with larger groups such as iodine 
(present in B428) or a benzodioxol arylethenyl (present in 
B623), which were both shown to enhance the activity in the 
benzo(b)thiophene-2-carboxamidine series. The 3 position of 
amiloride within the S, pocket is another site for substitution. 
However, substitutions at this site are expected to point toward 
Gln^^^ and then out toward bulk solvent as observed for the 5 
position of B428. Thus, use of a rigid linker may be necessary to 
redirect substitutions toward the protein including the sub- 
strate-binding groove. In summary, substitutions of the amilo- 
ride scaffold should occur at the 5 and 6 positions to provide 
direct access to the S^/S pocket or indirect access to other sites 
on the protein. 

Fhenylguanidine — Phenylguanidine inhibits urokinase with 
a of 20.6 /i-M (27) and is therefore a weaker inhibitor of 
urokinase than either amiloride or B428 (Table II). This inhib- 



itor also competitively inhibits micro-urokinase with a con- 
sistent with the LMW form (iC, = 20.6 ^lM LMW for urokinase, 
and Ki = 17.4 ^lm for micro-urokinase). To compare the binding 
mode of this inhibitor to amiloride and B428 and to determine 
potential sites of substitution, the co-crystal structure of phe- 
nylguanidine-micro-urokinase was completed at 2.0 A 
resolution. 

The phenylguanidine-micro-urokinase active site structure 
is very similar to that in the presence of B428 and amiloride. 
His^® is observed in multiple conformations while the Cys'*^— 
Cys^® disulfide bridge is intact. Additionally, the 2F^ - F^ 
(contoured at 1 a) and F^ — F^ (contoured at 3 cr) electron 
density maps (Fig. 5A) obtained using the urokinase model in 
the absence of inhibitor and before any refinement cycles shows 
that all atoms of the inhibitor are clearly defined in both maps. 
The inhibitor was found to bind at the S^ pocket (Fig. 5J5). 

Even though both amiloride and phenylguanidine have scaf- 
folds of the same size, the phenyl ring of phenylguanidine binds 
very differently from the pyrazine ring of amiloride (Fig. 5, B 
and C). Specifically, the phenylguanidine ring packs under- 
neath Ser^^^ and is interacting with the main chain atoms of 
VaP^^-Trp=^^^ as well as the side chain of Val^^^. The ring also 
interacts with the main chain atoms of Ser^^°-Cys^®^ as well as 
the side chain of Ser^^°. The differential ring packing is most 
likely due to amiloride possessing one additional linker atom 
between the guanidine and aromatic groups relative to phenyl- 
guanidine (Table II) because the guanidine groups are oriented 
very similarly. Specifically, the guanidine-NH is donating a 
hydrogen bond to Gly^'®-0 (3.0 A), whereas the amidine-like 
nitrogens are donating hydrogen bonds to Gly^^®-0 (2.9 A) and 
Asp'«^-061 (2.9 A) or to Asp^«^-052 (3.0 A) and Ser^^^-Ov (3.3 
A). Thus, it is likely that the core scaffold of amiloride (pyrazine 
ring) orients differently than the phenyl group of phenylguani- 
dine because the binding is being driven by the hydrogen bond- 
ing geometry of the guanidine groups rather than the van der 
Waals'/hydrogen bonding interactions of the core groups even 
though interactions of the core groups most certainly contrib- 
ute to the compound binding. 

The phenyl guanidine urokinase structure also shows that 
Gln^^*"^ has changed conformation and is in hydrophobic contact 
with the inhibitor (Fig. 5B) such that it is blocking the entrance 
to the S,/3 pocket. In the native and the B428 or amiloride 
complex structures, the Sii3 pocket is open where Gln^®^ is 
accepting a hydrogen bond from Lys^"*^ (3.3 A) and donating a 
hydrogen bond to Tyr^^^ (3.1 A). Thus, a conformational shift of 
this side chain requires breaking two hydrogen bonds. This is 
not the case for other serine proteases such as thrombin where 
there is no hydrogen bonding partner for Glu^^^ in either posi- 
tion. Here, there is less of an energy barrier to a conformational 
shift of Glu^^^, and the side chain may be found in both con- 
formations (49, 50). For urokinase, it appears that the binding 
of certain inhibitors such as phenyl guanidine does break the 
two Gln^^^ hydrogen bonds and conformationally shift Gln^^^ 
to maximize hydrophobic desolvation of the compound. Hence, 
Gln*^^ may be induced to shift conformation and because 
Gln^®**^ may act as a switch to the entrance to Si)3 from Si, 
noting the orientation of this side chain is important in a drug 
design strategy. 

The crystal structure of phenylguanidine-urokinase suggests 
a structure-based drug design strategy different from that with 
B428 or amiloride. Both B428 and amiloride are capable of 
directly accessing the Si/3 pocket, whereas the binding orien- 
tation of phenylguanidine is such that a similar interaction 
cannot be achieved by direct substitution of the phenyl ring 
(Fig. 5C) even with movement of Glnl92 to the S^^ open posi- 
tion. Specifically, as shown in Fig. 5 (B and C), the 2 and 3 



7246 



Crystal Structures of Urokinase at High Resolution 




B 



H57 



St95 




G218 



S195 



K143 




S195 



S19|.^ " 5^146 S19I 

1/0218 

D189 / / 




146 



189 f 



Fig 5 A initial 2F - (purple) and - (^reen) maps contoured at 1 and 3 respectively, for the binding site of phenyl guamdine before 
refinement: B mo^^^^^^ surface micro-urokinale as calculated by the program package QUANTA (Molecular Simulations Inc ) depictmg 
interactions between B428 and micro-urokinase. The inhibitor and inhibitor surface are shown in orange whereas P^^^em an^ protem s^^^^ 
are shown in cyan. C, overlay of the crystal structures of amiloride {purple) and phenyl guanidme (black) micro-urokinase, showing that the two 
scaffolds occupy different areas of the Si pocket. 



positions could point tow^ard the Si/3 pocket but are too far 
away to support direct interaction with SijS. In fact, substitu- 
tion of the phenyl ring with halogens at both the 2 and 3 
positions did not result in any increase in inhibitory potency 
(27). On the other hand, substitution at position 4 with a 
chloro- or trifluromethyl-group resulted in an increase in inhi- 
bition to Ki values of 6.8 and 6.5 /xm, respectively (27). This 4 
substitution is expected to orient toward the side chain of 
Ser^^^ and may obtain binding energy from a favorable van der 
Waals' packing interaction with Ser^^^ and the Si pocket. The 
5 and 6 positions are within the Si pocket and therefore less 
open for substitution. Because interactions with the Si/3 pocket 
are expected to confer an increase in binding potency and 
because phenylguanidine may not directly access this site, 
modification of the scaffold may be a promising drug design 
strategy for this series. 

Further examination of an overlay of the crystal structures of 
phenyl guanidine and amiloride micro-urokinase CFig. 5C) 
shows that the binding of the two scaffolds is complementary. 
The lack of overlap between the two groups suggests that the 
phenyl and pyrazine rings could be fused to form a 1- naphth- 



ylguanidine system. The naphthyl ring would be expected to 
occupy the sites of both core scaffolds and could therefore 
maintain the positive characteristics of both the phenylguani- 
dine and amiloride series. This would include utihzation of the 
4-chloro or 4-trifluromethyl substitutions in the phenylguani- 
dine series as well as access to the S^p pocket exploited by 
amiloride and B428. Hence, a merging of the amiloride and 
phenylguanidine scaffolds would be predicted to benefit from 
the additivity of both sites and create a more potent and easily 
optimized urokinase inhibitor. 

DISCUSSION 

Urokinase inhibitors have been shown to affect tumor me- 
tastasis and growth in vivo making urokinase an attractive 
anti-cancer target. However, these existing compounds lack all 
of the properties necessary for a therapeutic agent and require 
optimization. Crystallography driven structure-based drug de- 
sign based on a series of ligand-protein crystal structures can 
be utilized to optimize urokinase inhibition. The properties of 
the protein crystals can affect the efficiency of structure-based 
drug design because a larger number of more accurate struc- 




Crystal Structures of Urokinase at High Resolution 



7247 



tures provides a better description of the relationship between 
binding interactions and binding energy. Fortunately, ad- 
vances in molecular biology can be used to engineer the protein 
to obtain crystal systems that facilitate faster and more exact 
structure determinations and enhance the drug design cycle 
(47). Such a method has been used to design a crystal system 
for human urokinase for optimization of a urokinase inhibitor. 

The sequence of LMW urokinase was redesigned to produce 
a new crystal form that would permit a more ideal system for 
structure-based drug design. Specifically. LMW urokinase was 
re-engineered to minimize the areas of disorder that may likely 
cause suboptimal crystal packing. This recombinant protein, 
micro-urokinase, produces crystals with close packing interac- 
tions at the A-chain cleft, which would be blocked in LMW 
urokinase. This close molecular packing results in crystals that 
diffract to high resolution on a rotating anode source (1.6-2.0 
A). However, even though the micro-urokinase molecules are 
closely packed, the active site is both unoccupied and open to 
solvent channels in the crystal. This property readily allows 
compounds to be diffused into the crystal and has facilitated 
the determination of crystal structures in the presence of three 
reported urokinase inhibitors toward design of an anti-cancer 
agent. 

The micro-urokinase crystal system and soaking method was 
used to determine the co-crystal structures of micro-urokinase 
complexed with the inhibitors B428 (25, 26), amiloride (24), 
and phenylguanidine (27). Each of the co-crystal structures 
gives insight into favorable compound-protein interactions that 
contribute to the binding of these inhibitors to urokinase. The 
primary binding force is likely the hydrogen bonds between 
each inhibitor's amidine or guanidine group and Asp^^^. This 
salt bridge interaction is common to many guanidine or ami- 
dine complexes with trypsin or trypsin-like serine proteases 
such as thrombin, factor Xa, or tissue plasminogen activator 
(41-45) and is observed for Arg-Pi in the Glu-Gly-Arg-chlorom- 
ethyl ketone LMW urokinase structure (1). In addition to the 
hydrogen bonding interactions, van der Waals' packing be- 
tween the core scaffold and the Sj pocket may also contribute to 
the overall binding energy. Hydrophobic packing at the S, 
pocket is the primary binding interaction between substrates/ 
inhibitors in the chymotrypsin family of proteases where the 
Si pocket contains no charged groups (48-51). Additionally, a 
series of thrombin inhibitors that lack a positively charged 
group to interact with Asp^^^ have been described (52, 53). 
Hence, both hydrophilic and hydrophobic interactions at the 
Si pocket contribute to the binding of B428, amiloride, and 
phenylguanidine, and these interactions are present in other 
crystal structures. 

Examination of the urokinase structures reveals a new ad- 
ditional binding site adjacent to the Sj pocket. The site, termed 
the Si/3 subpocket, is composed of the disulfide bridge at 
Cys*^^-Cys^^", residues Ser''**^ and Gly^**^, and the side chain of 
Lys^^"*. The S^jS subpocket is also present in the LMW uroki- 
nase structure (Protein Data Bank entry ILMW) and is away 
from any re-engineered sites. The crystal structure of phenyl 
guanidine urokinase reveals that Gln^*'*'^ may act as a switch for 
the closing and opening of S^p. In the native and B428 or 
amiloride complex structures, the Si/3 pocket is open, and 
Gln^^^ is involved in two hydrogen bonds (Lys^"^^ and Tyr^^M. 
However, in the presence of other inhibitors such as phenyl 
guanidine or Glu-Gly-Arg-chloromethyl ketone (1), the hydro- 
gen bonds are broken, and the conformation of Gln^®^ shift;ed 
such that its side chain is in van der Waals' contact with the 
inhibitor. In this conformation, the entrance to Sj/S is blocked, 
and the shift is most likely induced to maximize interactions 
with the inhibitor. Hence, although the Si/3 pocket may be 



blocked by the induced movement of Gln^®^, its proximity to Sj 
makes it an attractive subsite for structure-based drug design. 
The halogen atoms of B428 and amiloride are interacting 
with the entrance to the Sj/S subsite (Gly^^®-Cys^^°). Interac- 
tions at this site have been shown to confer a significant in- 
crease in inhibitory potency for the benzo(b)thiophene-2-car- 
boxamidine series where the 4-iodo group (ICso — 0-32 ^) or 
4-benzodioxolanyletheyl (IC50 = 0.07 /xm) inhibit more strongly 
than the 4-hydro compound (IC50 = 3.7 ftw) (25, 26). The 
increase in potency observed for both substitutions is most 
likely due to packing interactions at the S^jS pocket. Phenyl- 
guanidine lacks a halogen atom to access the Si/3 pocket, and 
examination of the structure reveals that the pocket can not be 
easily accessed by a direct substitution of the phenylguanidine 
ring. However, an overlay of the phenylguanidine crystal struc- 
ture with that of amiloride reveals that the two scaffolds could 
be merged to form a 1-guanadyl naphthalene. This compound 
could, in turn, access the S^jS pocket. Hence, urokinase co- 
crystal structures with B428, amiloride, and phenylguanidine 
indicate that all three scaffolds may provide either direct or 
indirect access to the Si/3 pocket. Furthermore, this newly 
described subsite has great potential for the future design of 
more potent urokinase inhibitors for the treatment of cancer. 

Acknowledgments — We thank Dr. Bruce Littlefield of the Eisai Com- 
pany for initial supplies of B428 and Dr. Todd Rockway for synthesis of 
e-amino caproic acid p-carbethoxyphenyl ester chloride. We also thank 
Dr. Stephen Betz for critical examination of the manuscript and Dr. 
Jonathan Greer for many helpful discussions and critical examination 
of the manuscript. 

REFERENCES 

1. Spraggon, G., Phillips, C, Nowak, U. K., Ponting, CP., Saunders, D., and 

Dobson, 0. M. (1995) Structure 3, 681-691 

2. Kohn. E. C. (1991) Pharmacol. Then 52, 235-244 

3. Quax, P. H., van, L. R. T., Verspaget, H. W., and Verheijen, J. H. (1990) Cancer 

Res. 50, 1488-1494 

4. Behrendt, N., Ronne, E.. Ploug, M., Petri, T., Lober, D., Nielsen, L. S., 

Schleuning, W. D., Blasi, F., Appella, E., and Dano, K. (1990) J. Biol. 
Chein. 265, 6453-6460 

5. Schmitt. M., Janicke, F., Moniwa, N., Chucholowski, N., and Pache. (1992) 

Biol. Chem. Hoppe-Seyler 373, 611-622 

6. Duffy, M. J. (1990) Blood Coagul. Fibrinolysis 1, 681-687 

7. Astedt, a., Billstrom, A., and Lecander, I. (1995) Fibrinolysis 9, 175-177 

8. Evans, D., SIoan-StaklefT, K., Arvan, M.. and Guyton. D. (1998) Clin. Exp. 

Metastasis 16, 353-357 

9. Banerji, A., Femandes, A., Bane, S., and Ahire, S, (1998) Cancer Utt. 129, 

15-20 

10. Kobayahsi, H., Gotoh, J., Shinohara, H., Moniwa. N.. and Terao, T. (1994) 

Thromb. Haemostasis 71, 474-480 

11 . Xiao, G., Liu, Y., Gentz, R., Sang, Q., C^ldberg, I„ and Shi, Y. (1999) Proc. Nat. 

Acad. Sci. U. S. A 96, 3700-3705 

12. Rabbani, S., Harakidas, P., Davidson, D., Henkin, J., and Mazar. A, (1995) Int. 

J. Cancer 63, 840- 845 

13. Alonso, D., Tejera, A., Farias, E., Joffe. E., and Bomez, D. (1998) Anticancer 

Res. 18, 4499-4504 

14. Alonso, D., Farias, E., Ladeda, V., Davel, L., Puricelli, L., and Joffe, E. (1996) 

Breast Cancer Res. Treat. 40, 209-223 

15. Jankun, J., Keck, R., Skrzypczak-Jankun, E., and Swiercz, R. (1997) Cancer 

Hes. 57, 559-563 

16. Browner, M. F., Smith, W. W., and Caste lhano, A. L. (1995) Biochemistry 34, 

6602-6610 

17. Chand, P.. Babu, Y. S., Bantia, S., Chu, N., Cole. L. B.. and Kotian, P. L. (1997) 

J. Med. Chem. 40, 4030-4052 

18. Erickson. J.. Neidhart , D. J., VanDrie, J., Kempf, D. J., and Wang, X. C. (1990) 

Science 249, 527-533 

19. Lam, P. Y., Jadhav, P. K., Eyermann, C. J.. Hodge, C. N., and Ru, Y. (1994) 

Science 263, 380-384 

20. Kurumbail, R. G., Stevens, A. M., Gierse, J. K., McDonald, J. J., Stegeman, 

R. A., and Pak, J. Y. (1996) Nature 384, 644-648 

21. Luong, C. Miller. A., Barnett, J., Chow, J., and Ramesha, C. (1996) Nat. 

Struct. Biol. 3, 927-933 

22. Verlindc, C. L., and Hoi, W. G. (1994) Structure 2, 577-587 

23. Luzatti, P. V. (1952) Acto Crystallogr. 5, 802-810 

24. Vassalli. J. D., and Belin, D. (1987) FEBS Utt. 214, 187-191 

25. Bridges, A. J., Lee, A., Schwartz, C. E., Towle, M. J., and Littlefield, B. A. 

(1993) Bioorg. Med. Chem. 1, 403-410 

26. Towle, M. J., Lee, A,. Maduakor, E. C, Schwartz, C. E., Bridges, A. J., and 

Littlefield, B. A. (1993) Cancer Res. 53, 2553-2559 

27. Yang, H., Henkin. J., Kim, K. H., and Greer, J. (1990) J. Med. Chem. 33, 

2956-2961 

28 1^. K.-M., and Gillies, S. D. (1991) Biochi. Biophys. Acta 1088, 217-224 
29. Wang. J.. Brdar. B.. and Reich. E. (1995) Protein Sci. 4, 1758-1767 



7248 



Crystal Structures of Urokinase at High Resolution 



30. Barlow, G. H. (1976) Methods EnzymoL 45, 239-244 

31. Segel. I. H. (1975) Emyme Kinetics: Behavior and Analysis of Rapid Equilib- 

rium and Steady-State Enzyme Systems, John Wiley & Sons, New York 

32. Menegatti, E., Guarneri, M., Bolognesi, M., Ascenzi, P., and G., A. (1989) J. 

Emyme Inhibition 2, 249-259 

33. Otwinowski, Z., and Minor, W. (1997) Methods EnzymoL 276, 307-326 

34. Navaaa, J. (1994) Acta Crystallogr. A 50, 157-163 

35. Brunger, A. T. (1993) X PLOR, version 3.1, Yale University Press, New Haven, 

err 

36. Sheldrick, G. M. (1990) SHELX 97, Gottingen University. Gottingen, 

Germany 

37. Bode, W., Mayr, I., Baumann, U.. Hubcr, R., and Stone, S. R. (1989) EMBO J. 

8, 3467-3475 

38. Lamba, D., Bauer, M., Huber, R., Fischer, S., Rudolph, R., Kohnert, U., and 

Bode. W. (1996) J. Mol. BioL 258, 117-135 

39. Padmanabhan, K., Padmanabhan, K. P., Tulinsky, A., Park, C. H., Bode, W., 

Huber, R., Blankenship, D. T., Cardin, A. D.. and Kisiel, W. (1993) J. Mol, 
BioL 232, 947-966 

40. Henderson. R. (1970) J. Mol BioL 54, 341-354 



41. Bode, Q.. Turk. D., and Sturzebecher, J. (1990* Eur, J, Biochem. 193, 175-182 

42. Bode, W., and Schwager. P. (1975) J. MoL BioL 98, 693-717 

43. Banner, D. W., and Hadvary, P. (1991) J. BioL Chem. 266, 20085-20093 

44. Renatus, M.. Bode, W., Huber, R., Sturzebecher, J., Prasa, D., Fischer, S., 

Kohnert, U., and Stubbs, M. (1997) J. BioL Chem. 272, 21713-21719 

45. Brandstetter, H., Kuhne, A, Bode, W., Huber, R., von der Saal, W,, 

Wirthensohn, K., and Engh, R. (1996) J. BioL Chem. 271, 29988-29992 

46. Baba, W. 1., Lant, A. F., Smith, A. J., Townshend, M. M., and Wilson, G. M. 

(1968) Clin. PharmacoL Ther. 9, 318-327 

47. Price, S., and Nagai, K. (1995) Curr. Opin. BiotechnoL 6, 425-430 

48. SteiU, T., Henderson, R., and Blow, D. (1969) J. MoL BioL 46, 337-348 

49. Blow, D. (1976) Acc. Chem. Res. 9. 145-152 

50. Sigler, P., Jeffery, B., Matthews, B., and Blow, D. (1966) J. MoL BioL 15. 

175-192 

51. Birktoft, J., and Blow. D. (1972) J. MoL BioL 68, 187-240 

52. Malikayil, J. A., Burkhart, J. P., Schreuder, H. A., Broersma, R. J., Tardif, C., 

Kutcher, L. W., Mehdi, S., Schatzman. G. L., Neises, B., and Peet. N. P. 
(1997) Biochemistry 36, 1034-1040 

53. Das, J., and Kimball, S. D. (1995) Bioorg. Med. Chem, 3, 999-1007 





Exhibit 25 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 

International Bureau 



PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 7 ; 

C07K 14/435, 14/705, A61K 38/03, 
38/08, 38/17 



Al 



(11) International Publication Number: 



WO 00/52044 



(43) International Publication Date: 8 September 2000 (08.09.00) 



(21) International Application Number: 

(22) International Filing Date: 



PCr/USOO/05612 



2 March 2000 (02.03.00) 



(30) Priority Data: 

09/261,416 



3 March 1999 (03.03.99) 



US 



(71) Applicant: THE BOARD OF TRUSTEES OF THE UNIVER- 

SITY OF ARKANSAS [US/US]; 2404 North University Av- 
enue. Little Rock, AR 72207-3608 (US). 

(72) Inventors: 0*BRIEN, Timothy. J.; 2610 North Pierce. Little 

Rock, AR 72207 (US). UNDERWOOD. Lowell, J.; 
Apartment K, 121 N. Jackson Street. Little Rock. AR 72205 
(US). 

(74) Agent: ADLER. Benjamin. A.; McGregor & Adier. 8011 
Candle Lane. Houston. TX 77071 (US). 



(81) Designated States: AU. CA. JP. European patent (AT. BE. 
CH. CY. DE. DK. ES. FI, FR. GB, GR. IE. IT. LU. MC, 
NL. PT. SE). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Tirte: TRANSMEMBRANE SERINE PROTEASE OVEREXPRESSED IN OVARIAN CARCINOMA AND USES THEREOF 
(57) Abstract 

The present invention provides a TADG-12 protein and a DNA fragment encoding such protein. Also provided is a vector/host cell 
capable of expressing the DNA. The present invention further provided various methods of early detection of associated ovarian and other 
malignancies, and of interactive therapies for cancer treatment by utilizing the DNA and/or protein disclosed herein. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCX on the front pages of pamphlets publishing international applications under the PCX. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


OA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BC 


Bulgaria 


HI) 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


uc 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


us 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


SwitzerUnd 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


C6tc d'lvoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







wo 00/52044 



PCT/USOO/05612 



TRANSMEMBRANE SERINE PROTEASE OVEREXPRESSED IN 
5 OVARIAN CARCINOMA AND USES THEREOF 

BACKGROUND OF THE INVENTION 

1 0 Cross-Reference to Related Application 

This application is a continuation-in-part patent 
application and claims the benefit of priority under 35 USC§120 
of USSN 09/261,416, filed March 3, 1999. 

1 5 Field of the Invention 

The present invention relates generally to the fields of 
cellular biology and diagnosis of neoplastic disease. More 
specifically, the present invention relates to a transmembrane 
serine protease termed Tumor Associated Differentially-Expressed 

20 Gene- 12 (TADG-12), which is overexpressed in ovarian carcinoma. 

Description of the Related Art 

Tumor cells rely on the expression of a concert of 
proteases to be released from their primary sites and move to 
25 distant sites to inflict lethality. This metastatic nature is the result 
of an aberrant expression pattern of proteases by tumor cells and 
also by stromal cells surrounding the tumors [1-3], For most 
tumors to become metastatic, they must degrade their 
surrounding extracellular matrix components, degrade basement 

1 



wo 00/52044 PCT/USOO/05612 

membranes to gain access to the bloodstream or lymph system, 
and repeat this process in reverse fashion to settle in a secondary 
host site [3-6]. All of these processes rely upon what now appears 
to be a synchronized protease cascade. In addition, tumor cells 
5 use the power of proteases to activate growth and angiogenic 
factors that allow the tumor to grow progressively [1]. Therefore, 
much research has been aimed at the identification of tumor- 
associated proteases and the inhibition of these enzymes for 
therapeutic means. More importantly, the secreted nature and/or 
10 high level expression of many of these proteases allows for their 
detection at aberrant levels in patient serum, e.g. the prostate- 
specific antigen (PSA), which allows for early diagnosis of prostate 
cancer [7]. 

Proteases have been associated directly with tumor 
15 growth, shedding of tumor cells and invasion of target organs. 
Individual classes of proteases are involved in, but not limited to 
(1) the digestion of stroma surrounding the initial tumor area, (2) 
the digestion of the cellular adhesion molecules to allow 
dissociation of tumor cells; and (3) the invasion of the basement 
20 membrane for metastatic growth and the activation of both tumor 
growth factors and angiogenic factors. 

For many forms of cancer, diagnosis and treatment has 
improved dramatically in the last 10 years. However, the five 
year survival rate for ovarian cancer remains below 50% due in 
25 large part to the vague symptoms which allow for progression of 
the disease to an advanced stage prior to diagnosis [8]. Although 
the exploitation of the CA125 antigen has been useful as a marker 
for monitoring recurrence of ovarian cancer, it has not proven to 
be an ideal marker for early diagnosis. Therefore, new markers 



2 



wo 00/52044 PCT/USOO/05612 

that may be secreted or released from cells and which are highly 
expressed by ovarian tumors could provide a useful tool for the 
early diagnosis and for therapeutic intervention in patients with 
ovarian carcinoma. 
5 The prior art is deficient in the lack of the complete 

identification of the proteases overexpressed in carcinoma, 
therefore, deficient in the lack of a tumor marker useful as an 
indicator of early disease, particularly for ovarian cancers. 
Specifically, TADG-12, a transmembrane serine protease, has not 
10 been previously identified in either nucleic acid or protein form. 
The present invention fulfills this long-standing need and desire 
in the art, 

SUMMARY OF THE INVENTION 

15 

The present invention discloses TADG-12, a new 
member of the Tumor Associated Differentially-Expressed Gene 
(TADG) family, and a variant splicing form of TADG-12 (TADG- 
12 V) that could lead to a truncated protein product. TADG-12 is a 

20 transmembrane serine protease overexpressed in ovarian 
carcinoma. The entire cDNA of TADG-12 has been identified (SEQ 
ID No. 1). This sequence encodes a putative protein of 454 amino 
acids (SEQ ID No. 2) which includes a potential transmembrane 
domain, an LDL receptor like domain, a scavenger receptor 

25 cysteine rich domain, and a serine protease domain. These 
features imply that TADG-12 is expressed at the cell surface, and 
it may be used as a molecular target for therapy or a diagnostic 
marker. 



3 



wo 00/52044 PCT/USOO/05612 

In one embodiment of the present invention, there is 
provided a DNA fragment encoding a TADG-12 protein selected 
from the group consisting of: (a) an isolated DNA fragment which 
encodes a TADG-12 protein; (b) an isolated DNA fragment which 
5 hybridizes to isolated DNA fragment of (a) above and which 
encodes a TADG-12 protein; and (c) an isolated DNA fragment 
differing from the isolated DNA fragments of (a) and (b) above in 
codon sequence due to the degeneracy of the genetic code, and 
which encodes a TADG-12 protein. Specifically, the DNA fragment 

1 0 has a sequence shown in SEQ ID No. 1 or SEQ ID No. 3. 

In another embodiment of the present invention, there 
is provided a vector/host cell capable of expressing the DNA of the 
present invention. 

In yet another embodiment of the present invention, 

15 there is provided an isolated and purified TADG-12 protein 
encoded by DNA selected from the group consisting of: (a) isolated 
DNA which encodes a TADG-12 protein; (b) isolated DNA which 
hybridizes to isolated DNA of (a) above and which encodes a 
TADG-12 protein; and (c) isolated DNA differing from the isolated 

20 DNAs of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG-12 
protein. Specifically, the TADG-12 protein has an amino acid 
sequence shown in SEQ ID No. 2 or SEQ ID No. 4. 

In still yet another embodiment of the present 

25 invention, there is provided a method for detecting expression of a 
TADG-12 protein, comprising the steps of: (a) contacting mRNA 
obtained from the cell with the labeled hybridization probe; and 
(b) detecting hybridization of the probe with the mRNA. 



4 



wo 00/52044 PCTAJSOO/05612 

The present invention further provides methods for 
diagnosing a cancer or other malignant hyperplasia by detecting 
the TADG-12 protein or mRNA disclosed herein. 

In still another embodiment of the present invention, 
5 there is provided a method of inhibiting expression of endogenous 
TADG-12 mRNA in a cell by introducing a vector into the cell, 
wherein the vector comprises a DNA fragment of TADG-12 in 
opposite orientation operably linked to elements necessary for 
expression. 

10 In still yet another embodiment of the present 

invention, there is provided a method of inhibiting expression of a 
TADG-12 protein in a cell by introducing an antibody directed 
against a TADG-12 protein or fragment thereof. 

In still yet another embodiment of the present 

15 invention, there is provided a method of targeted therapy by 
administering a compound having a targeting moiety specific for a 
TADG-12 protein and a therapeutic moiety. Specifically, the 
TADG-12 protein has an amino acid sequence shown in SEQ ID No. 
2 or SEQ ID No. 4, 

20 The present invention still further provides a method 

of vaccinating an individual against TADG-12 by inoculating the 
individual with a TADG-12 protein or fragment thereof. 
Specifically, the TADG-12 protein has an amino acid sequence 
shown in SEQ ID No. 2 or SEQ ID No. 4. The TADG-12 fragment 

25 includes the truncated form of TADG-12V peptide having a 
sequence shown in SEQ ID No. 8, and a 9-residue up to 12-residue 
fragment of TADG-12 protein. 

In yet another embodiment of the present invention, 
there is provided an immunogenic composition, comprising an 



5 



wo 00/52044 PCT/USOO/05612 

immunogenic fragment of a TADG-12 protein and an appropriate 
adjuvant. The TADG-12 fragment includes the truncated form of 
TADG-12V peptide having a sequence shown in SEQ ID No. 8, and a 
9-residue up to 12-residue fragment of TADG-12 protein. 
5 Other and further aspects, features, and advantages of 

the present invention will be apparent from the following 
description of the presently preferred embodiments of the 
invention given for the purpose of disclosure. 

1 0 BRIEF DESCRIPTION OF THE DRAWINGS 

So that the matter in which the above-recited features, 
advantages and objects of the invention, as well as others which 
will become clear, are attained and can be understood in detail, 

15 more particular descriptions of the invention briefly summarized 
above may be had by reference to certain embodiments thereof 
which are illustrated in the appended drawings. These drawings 
form a part of the specification. It is to be noted, however, that 
the appended drawings illustrate preferred embodiments of the 

20 invention and therefore are not to be considered limiting in their 
scope. 

Figure lA shows that the expected PGR product of 
approximately 180 bp and the unexpected PGR product of 
approximately 300 bp using the redundant serine protease 
25 primers were not amplified from normal ovary cDNA (Lane 1) but 
were found in abundance from ovarian tumor cDNA (Lane 2). The 
primer sequences for the PGR reactions are indicated by horizontal 
arrows. Figure IB shows that TADG-12 was subcloned from the 
180 bp band while the larger 300 bp band was designated TADG- 



6 



wo 00/52044 PCT/USOO/0561 2 

12V. The sequences were found to overlap for 180 bp (SEQ ED No. 
5 for nucleotide sequence, SEQ ID No. 6 for deduced amino acid 
sequence) with the 300 bp TADG-12V (SEQ ID No. 7 for nucleotide 
sequence, SEQ ID No. 8 for deduced amino acid sequence) having 
5 an additional insert of 133 bases. This insertion (vertical arrow) 
leads to a frame shift, which causes the TADG-12V transcript to 
potentially produce a truncated form of TADG-12 with a variant 
amino acid sequence. 

Figure 2 shows that Northern blot analysis for TADG- 

10 12 revealed three transcripts of 2.4, 1.6 and 0.7 kilobases. These 
transcripts were found at significant levels in ovarian tumors and 
cancer cell lines, but the transcripts were found only at low levels 
in normal ovary. 

Figure 3 shows an RNA dot blot (CLONTECH) probed 

15 for TADG-12. The transcript was detectable (at background 
levels) in all 50 of the human tissues represented with the 
greatest abundance of transcript in the heart. Putamen, amygdala, 
kidney, liver, small intestine, skeletal muscle, and adrenal gland 
were also found to have intermediate levels of TADG-12 

20 transcript. 

Figure 4 shows the entire cDNA sequence for TADG- 
12 (SEQ ID No. 1) with its predicted open reading frame of 45 4 
amino acids (SEQ ID No. 2). Within the nucleotide sequence, the 
Kozak's consensus sequence for the initiation of translation and 
25 the poly-adenylation signal are underlined. In the protein 
sequence, a potential transmembrane domain is boxed. The LDLR- 
A domain is underlined with a solid line. The SRCR domain is 
underlined with a broken line. The residues of the catalytic triad 
of the serine protease domain are circled, and the beginning of the 



7 



wo 00/52044 PCT/USOO/0561 2 

catalytic domain is marked with an arrow designated as a 
potential proteolytic cleavage site. The * represents the stop 
codon that terminates translation. 

Figure 5 A shows the 35 amino acid LDLR-A domain 
5 of TADG-12 (SEQ ID No. 13) aligned with other LDLR-A motifs 
from the serine protease TMPRSS2 (U75329, SEQ ID No. 14), the 
complement subunit C8 (P07358, SEQ ID No. 9), two LDLR-A 
domains of the glycoprotein GP300 (P98164, SEQ ID Nos. 11-12), 
and the serine protease matriptase (AFl 18224, SEQ ID No. 10). 

10 TADG-12 has its highest similarity with the other serine proteases 
for which it is 54% similar to TMPRSS2 and 53% similar to 
matriptase. The highly conserved cysteine residues are shown in 
bold type. Figure 5B shows the SRCR domain of TADG-12 (SEQ ID 
No. 17) aligned with other domain family members including the 

15 human macrophage scavenger receptor (P21757, SEQ ID No. 16), 
human enterokinase (P98073, SEQ ID No, 19), bovine enterokinase 
(P21758, SEQ ID No. 15), and the serine protease TMPRSS2 (SEQ ID 
No. 18). Again, TADG-12 shows its highest similarity within this 
region to the protease TMPRSS2 at 43%. Figure 5C shows the 

20 protease domain of TADG-12 (SEQ ID No, 23) in alignment with 
other human serine proteases including protease M (U62801, SEQ 
ID No. 20), trypsinogen I (P07477, SEQ ID No. 21), plasma 
kallikrein (P03952, SEQ ID No. 22), hepsin (P05981, SEQ ID No. 25), 
and TMPRSS2 (SEQ ID No. 24). Cons represents the consensus 

25 sequence for each alignment. 

Figure 6 shows semi-quantitative PGR analysis that 
was performed for TADG-12 (upper panel) and TADG-12V (lower 
panel). The amplification of TADG-12 or TADG-12V was 
performed in parallel with PGR amplification of p-tubulin product 



8 



wo 00/52044 PCT/USOO/0561 2 

as an internal control. The TADG-12 transcript was found to be 
overexpressed in 41 of 55 carcinomas. The TADG-12V transcript 
was found to be overexpressed in 8 of 22 carcinomas examined. 
Note that the samples in the upper panel are not necessarily the 
5 same as the samples in the lower panel. 

Figure 7 shows immunohistochemical staining of 
normal ovary and ovarian tumors which were performed using a 
polyclonal rabbit antibody developed to a TADG-12 specific 
peptide. No significant staining was detected in normal ovary 

10 (Figure 7A). Strong positive staining was observed in 22 of 2 9 
carcinomas examined. Figures 7B and 7C represent a serous and 
mucinous carcinoma, respectively. Both show diffuse staining 
throughout the cytoplasm of tumor cells while stromal cells 
remain relatively unstained. 

15 Figure 8 is a model to demonstrate the progression of 

TADG-12 within a cellular context. In normal circumstances, the 
TADG-12 transcript is appropriately spliced and the resulting 
protein is capable of being expressed at the cell surface where the 
protease may be cleaved to an active form. The role of the 

20 remaining ligand binding domains has not yet been determined, 
but one can envision their potential to bind other molecules for 
activation, internalization or both. The TADG-12V transcript, 
which occurs in some tumors, may be the result of mutation 
and/or poor mRNA processing may be capable of producing a 

25 truncated form of TADG-12 that does not have a functional 
protease domain. In addition, this truncated product may present 
a novel epitope at the surface of tumor cells. 



9 



wo 00/52044 PCTAJSOO/05612 

DETAILED DESCRIPTION OF THE INVENTION 



To examine the serine proteases expressed by ovarian 
cancers, a PCR based differential display technique was employed 
5 utilizing redundant PCR primers designed to the most highly 
conserved amino acids in these proteins [9]. As a result, a novel 
cell-surface, multi-domain serine protease, named Tumor 
Associated Differentially-expressed Gene-12 (TADG-12) was 
identified. TADG-12 appears to be overexpressed in many ovarian 

10 tumors. The extracellular nature of TADG-12 may render tumors 
susceptible to detection via a TADG-12 specific assay. In addition, 
a splicing variant of TADG-12, named TADG-12V, was detected at 
elevated levels in 35% of the tumors that were examined. TADG- 
12V encodes a truncated form of TADG-12 with an altered amino 

15 acid sequence that may be a unique tumor specific target for 
future therapeutic approaches. 

The TADG-12 cDNA is 2413 base pairs long (SEQ ID No. 
1) encoding a 454 amino acid protein (SEQ ID No. 2). A variant 
form, TADG-12V (SEQ ID No. 3), encodes a 294 amino acid protein 

20 (SEQ ID No. 4). The availability of the TADG-12 and/or TADG-12V 
gene opens the way for a number studies that can lead to various 
applications. For example, the TADG-12 and/or TADG-12V gene 
can be used as a diagnostic or therapeutic target in ovarian 
carcinoma and other carcinomas including breast, prostate, lung 

25 and colon. 

In accordance with the present invention there may be 
employed conventional molecular biology, microbiology, and 
recombinant DNA techniques within the skill of the art. Such 
techniques are explained fully in the literature. See, e.g., Maniatis, 



10 



wo 00/52044 PCTAJSOO/05612 

Fritsch & Sambrook, "Molecular Cloning: A Laboratory Manual 
(1982); "DNA Cloning: A Practical Approach," Volumes I and II 
(D.N. Glover ed. 1985); "Oligonucleotide Synthesis" (M.J. Gait ed. 
1984); "Nucleic Acid Hybridization" [B.D. Hames & S.J. Higgins eds. 
5 (1985)]; "Transcription and Translation" [B.D. Hames & S.J. Higgins 
eds. (1984)]; "Animal Cell Culture" [R.I. Freshney, ed. (1986)]; 
"Immobilized Cells And Enzymes" [IRL Press, (1986)]; B. Perbal, "A 
Practical Guide To Molecular Cloning" (1984). 

Therefore, if appearing herein, the following terms 
10 shall have the definitions set out below. 

As used herein, the term "cDNA" shall refer to the DNA 
copy of the mRNA transcript of a gene. 

As used herein, the term "derived amino acid 
sequence" shall mean the amino acid sequence determined b y 
15 reading the triplet sequence of nucleotide bases in the cDNA. 

As used herein the term "screening a library" shall 
refer to the process of using a labeled probe to check whether, 
under the appropriate conditions, there is a sequence 
complementary to the probe present in a particular DNA library. 
20 In addition, "screening a library" could be performed by PCR. 

As used herein, the term "PCR" refers to the 
polymerase chain reaction that is the subject of U.S. Patent Nos. 
4,683,195 and 4,683,202 to Mullis, as well as other improvements 
now known in the art. 
25 The amino acid described herein are preferred to be in 

the "L" isomeric form. However, residues in the "D" isomeric form 
can be substituted for any L-amino acid residue, as long as the 
desired functional property of immunoglobulin-binding is retained 
by the polypeptide. NH2 refers to the free amino group present at 



11 



wo 00/52044 PCT/USOO/05612 

the amino terminus of a polypeptide. CXX)H refers to the free 
carboxy group present at the carboxy terminus of a polypeptide. 
In keeping with standard polypeptide nomenclature, J BioL Chem., 
243:3552-59 (1969), abbreviations for amino acid residues are 
5 known in the art. 

It should be noted that all amino-acid residue 
sequences are represented herein by formulae whose left and 
right orientation is in the conventional direction of amino - 
terminus to carboxy-terminus. Furthermore, it should be noted 

10 that a dash at the beginning or end of an amino acid residue 
sequence indicates a peptide bond to a further sequence of one or 
more amino-acid residues. 

A "replicon" is any genetic element (e.g., plasmid, 
chromosome, virus) that functions as an autonomous unit of DNA 

15 replication in vivo\ i.e., capable of replication under its own 
control. 

A "vector" is a replicon, such as plasmid, phage or 
cosmid, to which another DNA segment may be attached so as to 
bring about the replication of the attached segment. 

20 A "DNA molecule" refers to the polymeric form of 

deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in 
its either single stranded form, or a double-stranded helix. This 
term refers only to the primary and secondary structure of the 
molecule, and does not limit it to any particular tertiary forms. 

25 Thus, this term includes double-stranded DNA found, inter alia, in 
linear DNA molecules (e.g., restriction fragments), viruses, 
plasmids, and chromosomes. In discussing the structure herein 
according to the normal convention of giving only the sequence in 



12 



wo 00/52044 PCT/USOO/05612 

the 5* to 3* direction along the nontranscribed strand of DNA (i.e., 
the strand having a sequence homologous to the mRNA). 

An "origin of replication" refers to those DNA 
sequences that participate in DNA synthesis. 
5 A DNA "coding sequence" is a double-stranded DNA 

sequence which is transcribed and translated into a polypeptide in 
vivo when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined 
by a start codon at the 5* (amino) terminus and a translation stop 

10 codon at the 3* (carboxyl) terminus. A coding sequence can 
include, but is not limited to, prokaryotic sequences, cDNA from 
eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., 
mammalian) DNA, and even synthetic DNA sequences. A 
polyadenylation signal and transcription termination sequence 

15 will usually be located 3* to the coding sequence. 

Transcriptional and translational control sequences are 
DNA regulatory sequences, such as promoters, enhancers, 
polyadenylation signals, terminators, and the like, that provide for 
the expression of a coding sequence in a host cell. 

20 A "promoter sequence" is a DNA regulatory region 

capable of binding RNA polymerase in a cell and initiating 
transcription of a downstream (3' direction) coding sequence. For 
purposes of defining the present invention, the promoter sequence 
is bounded at its 3' terminus by the transcription initiation site 

25 and extends upstream (5' direction) to include the minimum 
number of bases or elements necessary to initiate transcription at 
levels detectable above background. Within the promoter 
sequence will be found a transcription initiation site, as well as 
protein binding domains (consensus sequences) responsible for 



13 



wo 00/52044 PCTAJSOO/05612 

the binding of RNA polymerase. Eukaryotic promoters often, but 
not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic 
promoters contain Shine-Dalgarno sequences in addition to the - 1 0 
and -35 consensus sequences. 
5 An "expression control sequence" is a DNA sequence 

that controls and regulates the transcription and translation of 
another DNA sequence. A coding sequence is "under the control" 
of transcriptional and translational control sequences in a cell 
when RNA polymerase transcribes the coding sequence into 

10 mRNA, which is then translated into the protein encoded by the 
coding sequence. 

A "signal sequence" can be included near the coding 
sequence. This sequence encodes a signal peptide, N-terminal to 
the polypeptide, that communicates to the host cell to direct the 

15 polypeptide to the cell surface or secrete the polypeptide into the 
media, and this signal peptide is clipped off by the host cell before 
the protein leaves the cell. Signal sequences can be found 
associated with a variety of proteins native to prokaryotes and 
eukaryotes. 

20 The term "oligonucleotide", as used herein in referring 

to the probe of the present invention, is defined as a molecule 
comprised of two or more ribonucleotides, preferably more than 
three. Its exact size will depend upon many factors which, in turn, 
depend upon the ultimate function and use of the oligonucleotide. 

25 The term "primer" as used herein refers to an 

oligonucleotide, whether occurring naturally as in a purified 
restriction digest or produced synthetically, which is capable of 
acting as a point of initiation of synthesis when placed under 
conditions in which synthesis of a primer extension product, which 



14 



wo 00/52044 PCT/USOO/05612 

is complementary to a nucleic acid strand, is induced, i.e., in the 
presence of nucleotides and an inducing agent such as a DNA 
polymerase and at a suitable temperature and pH. The primer 
may be either single-stranded or double-stranded and must be 
5 sufficiently long to prime the synthesis of the desired extension 
product in the presence of the inducing agent. The exact length of 
the primer will depend upon many factors, including temperature, 
source of primer and use the method. For example, for diagnostic 
applications, depending on the complexity of the target sequence, 

10 the oligonucleotide primer typically contains 15-25 or more 
nucleotides, although it may contain fewer nucleotides. 

The primers herein are selected to be "substantially" 
complementary to different strands of a particular target DNA 
sequence. This means that the primers must be sufficiently 

15 complementary to hybridize with their respective strands. 
Therefore, the primer sequence need not reflect the exact 
sequence of the template. For example, a non-complementary 
nucleotide fragment may be attached to the 5' end of the primer, 
with the remainder of the primer sequence being complementary 

20 to the strand. Alternatively, non-complementary bases or longer 
sequences can be interspersed into the primer, provided that the 
primer sequence has sufficient complementary with the sequence 
or hybridize therewith and thereby form the template for the 
synthesis of the extension product. 

25 As used herein, the terms "restriction endonucleases" 

and "restriction enzymes" refer to enzymes, each of which cut 
double-stranded DNA at or near a specific nucleotide sequence. 

A cell has been "transformed" by exogenous or 
heterologous DNA when such DNA has been introduced inside the 



15 



wo 00/52044 PCTAJSOO/05612 

cell. The transforming DNA may or may not be integrated 
(covalently linked) into the genome of the cell. In prokaryotes, 
yeast, and mammalian cells for example, the transforming DNA 
may be maintained on an episomal element such as a plasmid. 
5 With respect to eukaryotic cells, a stably transformed cell is one in 
which the transforming DNA has become integrated into a 
chromosome so that it is inherited by daughter cells through 
chromosome replication. This stability is demonstrated by the 
ability of the eukaryotic cell to establish cell lines or clones 
10 comprised of a population of daughter cells containing the 
transforming DNA. A "clone" is a population of cells derived from 
a single cell or ancestor by mitosis, A "cell line" is a clone of a 
primary cell that is capable of stable growth in vitro for many 
generations. 

15 Two DNA sequences are "substantially homologous" 

when at least about 75% (preferably at least about 80%, and most 
preferably at least about 90% or 95%) of the nucleotides match 
over the defined length of the DNA sequences. Sequences that are 
substantially homologous can be identified by comparing the 

20 sequences using standard software available in sequence data 
banks, or in a Southern hybridization experiment under, for 
example, stringent conditions as defined for that particular 
system. Defining appropriate hybridization conditions is within 
the skill of the art. See, e.g., Maniatis et aL, supra; DNA Cloning, 

25 Vols. I & II, supra; Nucleic Acid Hybridization, supra. 

A "heterologous" region of the DNA construct is an 
identifiable segment of DNA within a larger DNA molecule that is 
not found in association with the larger molecule in nature. Thus, 
when the heterologous region encodes a mammalian gene, the 



16 



wo 00/52044 PCT/USOO/05612 

gene will usually be flanked by DNA that does not flank the 
mammalian genomic DNA in the genome of the source organism. 
In another example, coding sequence is a construct where the 
coding sequence itself is not found in nature (e.g., a cDNA where 
5 the genomic coding sequence contains introns, or synthetic 
sequences having codons different than the native gene). Allelic 
variations or naturally-occurring mutational events do not give 
rise to a heterologous region of DNA as defined herein. 

The labels most commonly employed for these studies 

10 are radioactive elements, enzymes, chemicals which fluoresce 
when exposed to ultraviolet light, and others. A number of 
fluorescent materials are known and can be utilized as labels. 
These include, for example, fluorescein, rhodamine, auramine, 
Texas Red, AMCA blue and Lucifer Yellow. A particular detecting 

15 material is anti-rabbit antibody prepared in goats and conjugated 
with fluorescein through an isothiocyanate. 

Proteins can also be labeled with a radioactive element 
or with an enzyme. The radioactive label can be detected by any 
of the currently available counting procedures. The preferred 

20 isotope may be selected from ^H, i^c, 32p^ 35s, 36ci, 5iCr, 57Co, 58Co, 
59Fe, 90Y, 1251, 1311, and is^Re. 

Enzyme labels are likewise useful, and can be detected 
by any of the presently utilized colorimetric, spectrophotometric, 
fluorospectrophotometric, amperometric or gasometric techniques. 

25 The enzyme is conjugated to the selected particle by reaction with 
bridging molecules such as carbodiimides, diisocyanates, 
glutaraldehyde and the like. Many enzymes which can be used in 
these procedures are known and can be utilized. The preferred 
are peroxidase, p-glucuronidase, P-D-glucosidase, P-D- 

17 



wo 00/52044 PCTAJSOO/05612 

galactosidase, urease, glucose oxidase plus peroxidase and alkaline 
phosphatase. U.S. Patent Nos. 3,654,090, 3,850,752, and 4,016,043 
are referred to by way of example for their disclosure of alternate 
labeling material and methods. 
5 A particular assay system developed and utilized in 

the art is known as a receptor assay. In a receptor assay, the 
material to be assayed is appropriately labeled and then certain 
cellular test colonies are inoculated with a quantitiy of both the 
label after which binding studies are conducted to determine the 
1 0 extent to which the labeled material binds to the cell receptors. I n 
this way, differences in affinity between materials can be 
ascertained. 

An assay useful in the art is known as a "cis/trans" 
assay. Briefly, this assay employs two genetic constructs, one of 

15 which is typically a plasmid that continually expresses a particular 
receptor of interest when transfected into an appropriate cell line, 
and the second of which is a plasmid that expresses a reporter 
such as luciferase, under the control of a receptor/ligand complex. 
Thus, for example, if it is desired to evaluate a compound as a 

20 ligand for a particular receptor, one of the plasmids would be a 
construct that results in expression of the receptor in the chosen 
cell line, while the second plasmid would possess a promoter 
linked to the luciferase gene in which the response element to the 
particular receptor is inserted. If the compound under test is an 

25 agonist for the receptor, the ligand will complex with the receptor, 
and the resulting complex will bind the response element and 
initiate transcription of the luciferase gene. The resulting 
chemiluminescence is then measured photometrically, and dose 
response curves are obtained and compared to those of known 



18 



wo 00/52044 PCTAJSOO/05612 

ligands. The foregoing protocol is described in detail in U.S. Patent 
No. 4,981,784. 

As used herein, the term "host" is meant to include not 
only prokaryotes but also eukaryotes such as yeast, plant and 
5 animal cells. A recombinant DNA molecule or gene which encodes 
a human TADG-12 protein of the present invention can be used to 
transform a host using any of the techniques commonly known to 
those of ordinary skill in the art. Especially preferred is the use of 
a vector containing coding sequences for the gene which encodes a 

10 huma TADG-12 protein of the present invention for purposes of 
prokaryote transformation. Prokaryotic hosts may include E. coli, 
5. tymphimurium, Serratia marcescens and Bacillus subtilis, 
Eukaryotic hosts include yeasts such as Pichia pastoris, 
mammalian cells and insect cells. 

15 In general, expression vectors containing promoter 

sequences which facilitate the efficient transcription of the 
inserted DNA fragment are used in connection with the host. The 
expression vector typically contains an origin of replication, 
promoter(s), terminator(s), as well as specific genes which are 

20 capable of providing phenotypic selection in transformed cells. 
The transformed hosts can be fermented and cultured according to 
means known in the art to achieve optimal cell growth. 

The invention includes a substantially pure DNA 
encoding a TADG-12 protein, a strand of which DNA will hybridize 

25 at high stringency to a probe containing a sequence of at least 1 5 
consecutive nucleotides of the sequence shown in SEQ ID No. 1 or 
SEQ ID No. 3. The protein encoded by the DNA of this invention 
may share at least 80% sequence identity (preferably 85%, more 
preferably 90%, and most preferably 95%) with the amino acids 



19 



wo 00/52044 PCTAJSOO/05612 

listed in SEQ ID No. 2 or SEQ ID No. 4. More preferably, the DNA 
includes the coding sequence of the nucleotides of Figure 4 (SEQ ID 
No. 1), or a degenerate variant of such a sequence. 

The probe to which the DNA of the invention 
5 hybridizes preferably consists of a sequence of at least 2 0 
consecutive nucleotides, more preferably 40 nucleotides, even 
more preferably 50 nucleotides, and most preferably 100 
nucleotides or more (up to 100%) of the coding sequence of the 
nucleotides listed in Figure 4 (SEQ ID No. 1) or the complement 
10 thereof. Such a probe is useful for detecting expression of TADG- 
12 in a human cell by a method including the steps of (a) 
contacting mRNA obtained from the cell with the labeled 
hybridization probe; and (b) detecting hybridization of the probe 
with the mRNA. 

15 This invention also includes a substantially pure DNA 

containing a sequence of at least 15 consecutive nucleotides 
(preferably 20, more preferably 30, even more preferably 50, and 
most preferably all) of the region from nucleotides 1 to 2413 of 
the nucleotides listed in SEQ ID No. 1, or of the region from 

20 nucleotides 1 to 2544 of the nucleotides listed in SEQ ID No. 3. The 
present invention also comprises antisense oligonucleotides 
directed against this novel DNA. Given the teachings of the 
present invention, a person having ordinary skill in this art would 
readily be able to develop antisense oligonucleotides directed 

25 against this DNA. 

By "high stringency" is meant DNA hybridization and 
wash conditions characterized by high temperature and low salt 
concentration, e.g., wash conditions of 65*^C at a salt concentration 
of approximately 0.1 x SSC, or the functional equivalent thereof. 



20 



wo 00/52044 PCT/USOO/05612 

For example, high stringency conditions may include hybridization 
at about 42*'C in the presence of about 50% formamide; a first 
wash at about 65°C with about 2 x SSC containing 1% SDS; followed 
by a second wash at about 65^C with about 0.1 x SSC, 
5 By "substantially pure DNA" is meant DNA that is not 

part of a milieu in which the DNA naturally occurs, by virtue of 
separation (partial or total purification) of some or all of the 
molecules of that milieu, or by virtue of alteration of sequences 
that flank the claimed DNA. The term therefore includes, for 

10 example, a recombinant DNA which is incorporated into a vector, 
into an autonomously replicating plasmid or virus, or into the 
genomic DNA of a prokaryote or eukaryote; or which exists as a 
separate molecule (e.g., a cDNA or a genomic or cDNA fragment 
produced by polymerase chain reaction (PGR) or restriction 

15 endonuclease digestion) independent of other sequences. It also 
includes a recombinant DNA which is part of a hybrid gene 
encoding additional polypeptide sequence, e.g., a fusion protein. 
Also included is a recombinant DNA which includes a portion of 
the nucleotides shown in SEQ ID No. 3 which encodes an 

20 alternative splice variant of TADG-12 (TADG-12V). 

The DNA may have at least about 70% sequence 
identity to the coding sequence of the nucleotides listed in SEQ ID 
No. 1 or SEQ ID No. 3, preferably at least 75% (e.g. at least 80%); 
and most preferably at least 90%. The identity between two 

25 sequences is a direct function of the number of matching or 
identical positions. When a subunit position in both of the two 
sequences is occupied by the same monomeric subunit, e.g., if a 
given position is occupied by an adenine in each of two DNA 
molecules, then they are identical at that position. For example, if 



21 



wo 00/52044 PCT/USOO/05612 

7 positions in a sequence 10 nucleotides in length are identical to 
the corresponding positions in a second 10-nucleotide sequence, 
then the two sequences have 70% sequence identity. The length 
of comparison sequences will generally be at least 50 nucleotides, 
5 preferably at least 60 nucleotides, more preferably at least 7 5 
nucleotides, and most preferably 100 nucleotides. Sequence 
identity is typically measured using sequence analysis software 
(e.g., Sequence Analysis Software Package of the Genetics 
Computer Group, University of Wisconsin Biotechnology Center, 
10 1710 University Avenue, Madison, WI 53705). 

The present invention comprises a vector comprising a 
DNA sequence which encodes a human TADG-12 protein and the 

f 

vector is capable of replication in a host which comprises, in 
operable linkage: a) an origin of replication; b) a promoter; and c) 

15 a DNA sequence coding for said protein. Preferably, the vector of 
the present invention contains a portion of the DNA sequence 
shown in SEQ ID No. 1 or SEQ ID No. 3. A "vector" may be defined 
as a replicable nucleic acid construct, e.g., a plasmid or viral 
nucleic acid. Vectors may be used to amplify and/or express 

20 nucleic acid encoding a TADG-12 protein. An expression vector is 
a replicable construct in which a nucleic acid sequence encoding a 
polypeptide is operably linked to suitable control sequences 
capable of effecting expression of the polypeptide in a cell. The 
need for such control sequences will vary depending upon the cell 

25 selected and the transformation method chosen. Generally, control 
sequences include a transcriptional promoter and/or enhancer, 
suitable mRNA ribosomal binding sites, and sequences which 
control the termination of transcription and translation. Methods 
which are well known to those skilled in the art can be used to 



22 



wo 00/52044 PCT/USOO/0561 2 

construct expression vectors containing appropriate 
transcriptional and translational control signals. See for example, 
the techniques described in Sambrook et al., 1989, Molecular 
Cloning: A Laboratory Manual (2nd Ed.), Cold Spring Harbor Press, 
5 N.Y. A gene and its transcription control sequences are defined as 
being "operably linked" if the transcription control sequences 
effectively control the transcription of the gene. Vectors of the 
invention include, but are not limited to, plasmid vectors and viral 
vectors. Preferred viral vectors of the invention are those derived 

10 from retroviruses, adenovirus, adeno-associated virus, SV40 virus, 
or herpes viruses. 

By a "substantially pure protein" is meant a protein 
which has been separated from at least some of those components 
which naturally accompany it. Typically, the protein is 

15 substantially pure when it is at least 60%, by weight, free from the 
proteins and other naturally-occurring organic molecules with 
which it is naturally associated in vivo. Preferably, the purity of 
the preparation is at least 75%, more preferably at least 90%, and 
most preferably at least 99%, by weight. A substantially pure 

20 TADG-12 protein may be obtained, for example, by extraction 
from a natural source; by expression of a recombinant nucleic acid 
encoding an TADG-12 polypeptide; or by chemically synthesizing 
the protein. Purity can be measured by any appropriate method, 
e.g., column chromatography such as immunoaffinity 

25 chromatography using an antibody specific for TADG-12, 
polyacrylamide gel electrophoresis, or HPLC analysis. A protein is 
substantially free of naturally associated components when it is 
separated from at least some of those contaminants which 
accompany it in its natural state. Thus, a protein which is 



23 



wo 00/52044 PCTAJSOO/05612 

chemically synthesized or produced in a cellular system different 
from the cell from which it naturally originates will be, by 
definition, substantially free from its naturally associated 
components. Accordingly, substantially pure proteins include 
5 eukaryotic proteins synthesized in E. coli, other prokaryotes, or 
any other organism in which they do not naturally occur. 

In addition to substantially full-length proteins, the 
invention also includes fragments (e.g., antigenic fragments) of the 
TADG-12 protein. As used herein, "fragment," as applied to a 

10 polypeptide, will ordinarily be at least 10 residues, more typically 
at least 20 residues, and preferably at least 30 (e.g., 50) residues 
in length, but less than the entire, intact sequence. Fragments of 
the TADG-12 protein can be generated by methods known to those 
skilled in the art, e.g., by enzymatic digestion of naturally 

15 occurring or recombinant TADG-12 protein, by recombinant DNA 
techniques using an expression vector that encodes a defined 
fragment of TADG-12, or by chemical synthesis. The ability of a 
candidate fragment to exhibit a characteristic of TADG-12 (e.g., 
binding to an antibody specific for TADG-12) can be assessed by 

20 methods described herein. Purified TADG-12 or antigenic 
fragments of TADG-12 can be used to generate new antibodies or 
to test existing antibodies (e.g., as positive controls in a diagnostic 
assay) by employing standard protocols known to those skilled in 
the art. Included in this invention are polyclonal antisera 

25 generated by using TADG-12 or a fragment of TADG-12 as the 
immunogen in, e.g., rabbits. Standard protocols for monoclonal 
and polyclonal antibody production known to those skilled in this 
art are employed. The monoclonal antibodies generated by this 
procedure can be screened for the ability to identify recombinant 



24 



wo 00/52044 



PCTAJSOO/05612 



TADG-12 cDNA clones, and to distinguish them from known cDNA 
clones. 

Further included in this invention are TADG-12 
proteins which are encoded at least in part by portions of SEQ ID 
5 No. 1 or SEQ ID No. 3, e.g., products of alternative mRNA splicing or 
alternative protein processing events, or in which a section of 
TADG-12 sequence has been deleted. The fragment, or the intact 
TADG-12 polypeptide, may be covalently linked to another 
polypeptide, e.g. which acts as a label, a ligand or a means to 
10 increase antigenicity. 

The invention also includes a polyclonal or monoclonal 
antibody which specifically binds to TADG-12. The invention 

4 

encompasses not only an intact monoclonal antibody, but also an 
immunologically-active antibody fragment, e.g., a Fab or (Fab)2 

15 fragment; an engineered single chain Fv molecule; or a chimeric 
molecule, e.g., an antibody which contains the binding specificity 
of one antibody, e.g., of murine origin, and the remaining portions 
of another antibody, e.g., of human origin. 

In one embodiment, the antibody, or a fragment 

20 thereof, may be linked to a toxin or to a detectable label, e.g. a 
radioactive label, non-radioactive isotopic label, fluorescent label, 
chemiluminescent label, paramagnetic label, enzyme label, or 
colorimetric label. Examples of suitable toxins include diphtheria 
toxin, Pseudomonas exotoxin A, ricin, and cholera toxin. Examples 

25 of suitable enzyme labels include malate hydrogenase, 
staphylococcal nuclease, delta-5-steroid isomerase, alcohol 
dehydrogenase, alpha-glycerol phosphate dehydrogenase, triose 
phosphate isomerase, peroxidase, alkaline phosphatase, 
asparaginase, glucose oxidase, beta-galactosidase, ribonuclease. 



wo 00/52044 PCTAJSOO/05612 

urease, catalase, glucose-6-phosphate dehydrogenase, 
glucoamylase, acetylcholinesterase, etc. Examples of suitable 
radioisotopic labels include ^h, 125i^ ISlj^ 32p 35s, 14c, etc. 

Paramagnetic isotopes for purposes of in vivo 
5 diagnosis can also be used according to the methods of this 
invention. There are numerous examples of elements that are 
useful in magnetic resonance imaging. For discussions on in vivo 
nuclear magnetic resonance imaging, see, for example, Schaefer et 
al., (1989) JACC 14, 472-480; Shreve et al., (1986) Magn. Reson, 

10 Med, 3, 336-340; Wolf, G. L., (1984) Physiol Chem, Phys. Med. 
NMRie, 93-95; Wesbey et al., (1984) PhysioL Chem. Phys, Med, 
NMR 16, 145-155; Runge et al., (1984) Invest. Radiol 19, 408-415. 
Examples of suitable fluorescent labels include a fluorescein label, 
an isothiocyalate label, a rhodamine label, a phycoerythrin label, a 

15 phycocyanin label, an allophycocyanin label, an ophthaldehyde 
label, a fluorescamine label, etc. Examples of chemiluminescent 
labels include a luminal label, an isoluminal label, an aromatic 
acridinium ester label, an imidazole label, an acridinium salt label, 
an oxalate ester label, a luciferin label, a luciferase label, an 

20 aequorin label, etc. 

Those of ordinary skill in the art will know of other 
suitable labels which may be employed in accordance with the 
present invention. The binding of these labels to antibodies or 
fragments thereof can be accomplished using standard techniques 

25 commonly known to those of ordinary skill in the art. Typical 
techniques are described by Kennedy et al., (1976) Clin. Chim. 
Acta 70, 1-31; and Schurs et al., (1977) Clin. Chim. Acta 81, 1-40. 
Coupling techniques mentioned in the latter are the 
glutaraldehyde method, the periodate method, the dimaleimide 



wo 00/52044 PCT/USOO/0561 2 

method, the m-maleimidobenzyl-N-hydroxy-succinimide ester 
method. All of these methods are incorporated by reference 
herein. 

Also within the invention is a method of detecting 
5 TADG-12 protein in a biological sample, which includes the steps 
of contacting the sample with the labeled antibody, e.g., 
radioactively tagged antibody specific for TADG-12, and 
determining whether the antibody binds to a component of the 
sample. 

10 As described herein, the invention provides a number 

of diagnostic advantages and uses. For example, the TADG-12 
protein disclosed in the present invention is useful in diagnosing 

4 

cancer in different tissues since this protein is highly 
overexpressed in tumor cells. Antibodies (or antigen-binding 

15 fragments thereof) which bind to an epitope specific for TADG-12, 
are useful in a method of detecting TADG-12 protein in a biological 
sample for diagnosis of cancerous or neoplastic transformation. 
This method includes the steps of obtaining a biological sample 
(e.g., cells, blood, plasma, tissue, etc.) from a patient suspected of 

20 having cancer, contacting the sample with a labeled antibody (e.g., 
radioactively tagged antibody) specific for TADG-12, and detecting 
the TADG-12 protein using standard immunoassay techniques 
such as an ELISA. Antibody binding to the biological sample 
indicates that the sample contains a component which specifically 

25 binds to an epitope within TADG-12. 

Likewise, a standard Northern blot assay can be used 
to ascertain the relative amounts of TADG-12 mRNA in a cell or 
tissue obtained from a patient suspected of having cancer, in 
accordance with conventional Northern hybridization techniques 



27 



wo 00/52044 PCT/USOO/05612 

known to those of ordinary skill in the art. This Northern assay 
uses a hybridization probe, e.g. radiolabelled TADG-12 cDNA, 
either containing the full-length, single stranded DNA having a 
sequence complementary to SEQ ID No. 1 or SEQ ID No. 3, or a 
5 fragment of that DNA sequence at least 20 (preferably at least 30, 
more preferably at least 50, and most preferably at least 100 
consecutive nucleotides in length). The DNA hybridization probe 
can be labeled by any of the many different methods known to 
those skilled in this art. 
10 Antibodies to the TADG-12 protein can be used in an 

immunoassay to detect increased levels of TADG-12 protein 
expression in tissues suspected of neoplastic transformation. 
These same uses can be achieved with Northern blot assays and 
analyses. 

15 The present invention is directed to DNA fragment 

encoding a TADG-12 protein selected from the group consisting of: 

(a) an isolated DNA fragment which encodes a TADG-12 protein; 

(b) an isolated DNA fragment which hybridizes to isolated DNA 
fragment of (a) above and which encodes a TADG-12 protein; and 

20 (c) an isolated DNA fragment differing from the isolated DNA 
fragments of (a) and (b) above in codon sequence due to the 
degeneracy of the genetic code, and which encodes a TADG-12 
protein. Preferably, the DNA has the sequence shown in SEQ ID 
No. 1 or SEQ ID No. 3. More preferably, the DNA encodes a TADG- 

25 12 protein having the amino acid sequence shown in SEQ ID No. 2 
or SEQ ID No. 4. 

The present invention is also directed to a vector 
and/or a host cell capable of expressing the DNA of the present 
invention. Preferably, the vector contains DNA encoding a TADG- 



28 



wo 00/52044 PCT/USOO/05612 

12 protein having the amino acid sequence shown in SEQ ID No, 2 
or SEQ ID No. 4. Representative host cells include bacterial cells, 
yeast cells, mammalian cells and insect cells. 

The present invention is also directed to an isolated 
5 and purified TADG-12 protein coded for by DNA selected from the 
group consisting of: (a) isolated DNA which encodes a TADG-12 
protein; (b) isolated DNA which hybridizes to isolated DNA of (a) 
above and which encodes a TADG-12 protein; and (c) isolated DNA 
differing from the isolated DNAs of (a) and (b) above in codon 

10 sequence due to the degeneracy of the genetic code, and which 
encodes a TADG-12 protein. Preferably, the isolated and purified 
TADG-12 protein has the amino acid sequence shown in SEQ ID No. 
2 or SEQ ID No. 4. 

The present invention is also directed to a method of 

15 detecting expression of the TADG-12 protein described herein, 
comprising the steps of: (a) contacting mRNA obtained from the 
cell with the labeled hybridization probe; and (b) detecting 
hybridization of the probe with the mRNA. 

A number of potential applications are possible for the 

20 TADG-12 gene and gene product including the truncated product 
TADG-12V. 

In one embodiment of the present invention, there is 
provided a method for diagnosing a cancer by detecting a TADG- 
12 protein in a biological sample, wherein the presence or absence 
25 of a TADG-12 protein indicates the presence or absence of a 
cancer. Preferably, the biological sample is selected from the 
group consisting of blood, urine, saliva, tears, interstitial fluid, 
ascites fluid, tumor tissue biopsy and circulating tumor cells. Still 
preferably, the detection of TADG-12 protein is by means selected 



29 



wo 00/52044 PCT/USOO/05612 

from the group consisting of Northern blot, Western blot, PCR, dot 
blot, ELIZA sandwich assay, radioimmunoassay, DNA array chips 
and flow cytometry. Such method is used for detecting an ovarian 
cancer, breast cancer, lung cancer, colon cancer, prostate cancer 
5 and other cancers in which TADG-12 is overexpressed. 

In another embodiment of the present invention, there 
is provided a method for detecting malignant hyperplasia by 
detecting a TADG-12 protein or TADG-12 mRNA in a biological 
sample. Further by comprising the TADG-12 protein or TADG-12 

10 mRNA to reference information, a diagnosis or a treatment can be 
provided. Preferably, PGR amplification is used for detecting 
TADG-12 mRNA, wherein the primers utilized are selected from 
the group consisting of SEQ ID Nos. 28-31. Still preferably, 
detection of a TADG-12 protein is by immunoaffinity to an 

15 antibody directed against a TADG-12 protein. 

In still another embodiment of the present invention, 
there is provided a method of inhibiting expression of endogenous 
TADG-12 mRNA in a cell by introducing a vector comprising a DNA 
fragment of TADG-12 in opposite orientation operably linked to 

20 elements necessary for expression. As a result, the vector 
produces TADG-12 antisense mRNA in the cell, which hybridizes to 
endogenous TADG-12 mRNA, thereby inhibiting expression of 
endogenous TADG-12 mRNA. 

In still yet another embodiment of the present 

25 invention, there is provided a method of inhibiting expression of a 
TADG-12 protein by introducing an antibody directed against a 
TADG-12 protein or fragment thereof. As a result, the binding of 
the antibody to the TADG-12 protein or fragment thereof inhibits 
the expression of the TADG-12 protein. 



30 



wo 00/52044 PCTAJSOO/05612 

TADG-12 gene products including the truncated form 
can be used for targeted therapy. Specifically, a compound having 
a targeting moiety specific for a TADG-12 protein and a 
therapeutic moiety is administered to an individual in need of 
5 such treatment. Preferably, the targeting moiety is selected from 
the group consisting of an antibody directed against a TADG-12 
protein and a ligand or ligand binding domain that binds a TADG- 
12 protein. The TADG-12 protein has an amino acid sequence 
shown in SEQ ID No. 2 or SEQ ID No. 4. Still preferably, the 

10 therapeutic moiety is selected from the group consisting of a 
radioisotope, a toxin, a chemotherapeutic agent, an immune 
stimulant and a cytotoxic agent. Such method can be used for 
treating an individual having a disease selected from the group 
consisting of ovarian cancer, lung cancer, prostate cancer, colon 

15 cancer and other cancers in which TADG-12 is overexpressed. 

In yet another embodiment of the present invention, 
there is provided a method of vaccinating, or producing an 
immune response in, an individual against TADG-12 by inoculating 
the individual with a TADG-12 protein or fragment thereof. 

20 Specifically, the TADG-12 protein or fragment thereof lacks TADG- 
12 activity, and the inoculation elicits an immune response in the 
individual, thereby vaccinating the individual against TADG-12. 
Preferably, the individual has a cancer, is suspected of having a 
cancer or is at risk of getting a cancer. Still preferably, TADG-12 

25 protein has an amino acid sequence shown in SEQ ID No. 2 or SEQ 
ID No. 4, while TADG-12 fragment has a sequence shown in SEQ ID 
No. 8, or is a 9-residue fragment up to a 20-residue fragment. 
Examples of 9-residue fragment are shown in SEQ ID Nos. 35, 36, 
55, 56, 83, 84, 97, 98, 119, 120, 122, 123 and 136. 



wo 00/52044 PCT/USOO/05612 

In Still yet another embodiment of the present 
invention, there is provided an immunogenic composition, 
comprising an immunogenic fragment of a TADG-12 protein and 
an appropriate adjuvant. Preferably, the immunogenic fragment 
5 of the TADG-12 protein has a sequence shown in SEQ ID No. 8, or is 
a 9-residue fragment up to a 20-residue fragment. Examples of 9- 
residue fragment are shown in SEQ ID Nos. 35, 36, 55, 56, 83, 84, 
97, 98, 119, 120, 122, 123 and 136. 

The following examples are given for the purpose of 
10 illustrating various embodiments of the invention and are not 
meant to limit the present invention in any fashion. 



EXAMPLE 1 

Tissue collection and storage 

15 Upon patient hysterectomy, bilateral salpingo- 

oophorectomy, or surgical removal of neoplastic tissue, the 
specimen is retrieved and placed on ice. The specimen was then 
taken to the resident pathologist for isolation and identification of 
specific tissue samples. Finally, the sample was frozen in liquid 

20 nitrogen, logged into the laboratory record and stored at -80°C. 
Additional specimens were frequently obtained from the 
Cooperative Human Tissue Network (CHTN). These samples were 
prepared by the CHTN and shipped on dry ice. Upon arrival, these 
specimens were logged into the laboratory record and stored at - 

25 80°C. 

EXAMPLE 2 

mRNA Extraction and cDNA Svnthesis 

Sixty-nine ovarian tumors (4 benign tumors, 10 low 
malignant potential tumors and 55 carcinomas) and 10 normal 



32 



wo 00/52044 



PCT/USOO/05612 



ovaries were obtained from surgical specimens and frozen in 
liquid nitrogen. The human ovarian carcinoma cell lines SW 626 
and Caov 3, the human breast carcinoma cell lines MDA-MB-231 
and MDA-MB-435S were purchased from the American Type 
5 Culture Collection (Rockville, MD). Cells were cultured to sub- 
confluency in Dulbecco's modified Eagle's medium, supplemented 
with 10% (v/v) fetal bovine serum and antibiotics. 

Extraction of mRNA and cDNA synthesis were carried 
out by the methods described previously [14-16]. mRNA was 
10 isolated by using a RiboSep mRNA isolation kit (Becton Dickinson 
Labware). In this procedure, poly A+ mRNA was isolated directly 
from the tissue lysate using the affinity chromatography media 
oligo(dT) cellulose. cDNA was synthesized with 5.0 \Lg of mRNA by 

random hexamer priming using 1st strand cDNA synthesis kit 
1 5 (CLONTECH). 

EXAMPLE 3 

PGR with Redundant Primers and Cloning of TADG-12 cDNA 

Redundant primers, forward 5'- 

20 TGGGTIGTIACIGCIGCICA(CT)TG -3' (SEQ ID No. 26) and reverse 5'- 
A(AG)IA(AG)IGCIATITCITTICC-3' (SEQ ID No. 27), for the 
consensus sequences of amino acids surrounding the catalytic 
triad for serine proteases were used to compare the PGR products 
from normal and carcinoma cDNAs. The appropriate bands were 

25 ligated into Promega T-vector plasmid and the ligation product 
was used to transform JM109 cells (Promega) grown on selection 
media. After selection of individual colonies, they were cultured 
and plasmid DNA was isolated by means of the Wizard miniprep 
DNA purification system (Promega). Nucleotide sequencing was 

33 



wo 00/52044 PCT/USOO/05612 

performed using PRISM Ready Reaction Dye Deoxy terminator 
cycle sequencing kit (Applied Biosystems). Applied Biosystems 
Model 373A DNA sequencing system was used for direct cDNA 
sequence determination. 
5 The original TADG-12 subclone was randomly labeled 

and used as a probe to screen an ovarian tumor cDNA library b y 
standard hybridization techniques [11,15]. The library was 
constructed in XZAP using mRNA isolated from the tumor cells of a 

stage Ill/grade III ovarian adenocarcinoma patient. Three 
10 overlapping clones were obtained which spanned 2315 
nucleotides. The final 99 nucleotides encoding the most 3' 
sequence including the poly A tail was identified by, homology 
with clones available in the GenBank EST database. 



15 EXAMPLE 4 

Quantitative PGR 

The mRNA overexpression of TADG-12 was 
determined using a quantitative PGR. Quantitative PGR was 
performed according to the procedure as previously reported [16]. 

20 Oligonucleotide primers were used for: TADG-12, forward 5'- 
GAAAGATGTGCTTGCrCTGG-3' (SEQ ID No. 28) and reverse 5'- 
AGTAAGTTGGAGAGGGTGGT-3' (SEQ ID No. 29); the variant TADG-12, 
forward 5'-TGGAGGTGGGTCTAGTTTGG-3' (SEQ ID No. 30), reverse 
5'-GTGTTTGGGTTGTACTTGCr-3' (SEQ ID No. 31); p -tubulin, forward 

25 5'- GGGATCAAGGTGTACTAGAA -3' (SEQ ID No. 32) and reverse 5'- 
TAGGAGCTGGTGGACTGAGA -3' (SEQ ID No. 33). p -tubulin was 
utilized as an internal control. The PGR reaction mixture consists 
of cDNA derived from 50 ng of mRNA, 5 pmol of sense and 
antisense primers for both the TADG-12 gene and the P-tubulin 

34 



wo 00/52044 PCTAJSOO/05612 

gene, 200 \imo\ of dNTPs, 5 ^iCi of a-'^PdCTP and 0.25 unit of Taq 
DNA polymerase with reaction buffer (Promega) in a final volume 
of 25 ^il. The target sequences were amplified in parallel with the 
P-tubulin gene. Thirty cycles of PCR were carried out in a Thermal 
5 Cycler (Perkin-Elmer Cetus). Each cycle of PCR included 3 0 
seconds of denaturation at 94%C, 30 seconds of annealing at 60%C 
and 30 seconds of extension at 72%C. The PCR products were 
separated on 2% agarose gels and the radioactivity of each PCR 
product was determined by using a Phospho Imager (Molecular 

10 Dynamics). The present study used the expression ratio (TADG- 
12/p-tubulin) as measured by phosphoimager to evaluate gene 
expression and defined the value at mean + 2SD of normal ovary 
as the cut-off value to determine overexpression. The student's t 
test was used for comparison of the mean values of normal ovary 

1 5 and tumors. 



EXAMPLE 5 

Sequencing of TADG- 1 2/TADG-12V 

Utilizing a plasmid specific primer near the cloning 

20 site, sequencing reactions were carried out using PRISM^'^ Ready 
Reaction Dye Deoxy'^'^ terminators (Applied Biosy stems cat# 
401384) according to the manufacturer's instructions. Residual 
dye terminators were removed from the completed sequencing 
reaction using a Centri-sep'^'^ spin column (Princeton Separation 

25 cat.# CS-901). An Applied Biosystems Model 373A DNA 
Sequencing System was available and was used for sequence 
analysis. 



35 



wo 00/52044 PCT/USOO/05612 

EXAMPLE 6 

Antibody Production 

Polyclonal rabbit antibodies were generated b y 
immunization of white New Zealand rabbits with a poly-lysine 
5 linked multiple antigen peptide derived from the TADG-12 
carboxy-terminal protein sequence NH -WIHEQMERDLKT-COOH 

(WIHEQMERDLKT, SEQ ID No. 34). This peptide is present in full 
length TADG-12, but not TADG-12V. Rabbits were immunized 
with approximately 100 ^ig of peptide emulsified in Ribi adjuvant. 
1 0 Subsequent boost immunizations were carried out at 3 and 6 
weeks, and rabbit serum was isolated 10 days after the boost 
inoculations. Sera were tested by dot blot analysis to. determine 

* 

affinity for the TADG-12 specific peptide. Rabbit pre-immune 
serum was used as a negative control. 

15 

EXAMPLE 7 

Northern Blot Analysis 

10 ng of mRNA were loaded onto a 1% formaldehyde- 
agarose gel, electrophoresed and blotted on a Hybond-N+ nylon 
20 membrane (Amersham). "P-labeled cDN A probes were made by 
Prime-a-Gene Labeling System (Promega). The PGR products 
amplified by the same primers as above were used for probes. 
The blots were prehybridized for 30 min and hybridized for 6 0 

min at 68%C with "P-labeled cDNA probe in ExpressHyb 
25 Hybridization Solution (CLONTECH). Control hybridization to 
determine relative gel loading was performed with the P-tubulin 
probe. 



36 



wo 00/52044 PCT/USOO/0561 2 

Normal human tissues; spleen, thymus, prostate, testis, 
ovary, small intestine, colon and peripheral blood leukocyte, and 
normal human fetal tissues; brain, lung, liver and kidney (Human 
Multiple Tissue Northern Blot; CLONTECH) were also examined b y 
5 same hybridization procedure. 

EXAMPLE 8 

Immunohistochemistry 

Immunohistochemical staining was performed using a 

10 Vectastain Elite ABC Kit (Vector). Formalin fixed and paraffin 
embedded specimens were routinely deparaffinized and processed 
using microwave heat treatment in 0.01 M sodium citrate buffer 
(pH 6.0). The specimens were incubated with normal goat serum 
in a moist chamber for 30 minutes. TADG-12 peptide antibody 

15 was allowed to incubate with the specimens in a moisture 
chamber for 1 hour. Excess antibody was washed away with 
phosphate buffered saline. After incubation with biotinylated 
anti-rabbit IgG for 30 minutes, the sections were then incubated 
with ABC reagent (Vector) for 30 minutes. The final products 

20 were visualized using the AEC substrate system (DAKO) and 
sections were counterstained with hematoxylin before mounting. 
Negative controls were performed by using normal serum instead 
of the primary antibody. 

25 EXAMPLE 9 

Isolation of Catalvtic Domain Subclones of TADG-12 and TADG-12 
Variant 

To identify serine proteases that are expressed in 
ovarian tumors, redundant PCR primers designed to the conserved 



37 



wo 00/52044 



PCT/USOO/05612 



regions of the catalytic triad of these enzymes were employed. A 
sense primer designed to the region surrounding the conserved 
histidine and an anti-sense primer designed to the region 
surrounding the conserved aspartate were used in PCR reactions 
5 with either normal ovary or ovarian tumor cDNA as template. I n 
the reaction with ovarian tumor cDNA, a strong product band of 
the expected size of approximately 180 bp was observed as well 
as an unexpected PCR product of approximately 300 bp which 
showed strong expression in some ovarian tumor cDNA's (Figure 

10 lA). Both of these PCR products were subcloned and sequenced. 
The sequence of the subclones from the 180bp band (SEQ ID No. 5) 
was found to be homologous to the sequence identified in the 
larger, unexpected band (SEQ ID No. 7) except that the larger band 
had an additional insert of 133 nucleotides (Figure IB). The 

15 smaller product of the appropriate size encoded for a protein 
sequence (SEQ ID No. 6) homologous to other known proteases 
while the sequence with the insertion (SEQ ID No. 8) encoded for a 
frame shift from the serine protease catalytic domain and a 
subsequent premature translational stop codon. TADG-12 variants 

20 from four individual tumors were also subcloned and sequenced. 
It was found that the sequence and insert to be identical. The 
genomic sequences for these cDNA derived clones were amplified 
by PCR, examined and found to contain potential AG/GT splice 
sites that would allow for the variant transcript production. 

25 

EXAMPLE 10 

Northern Blot Analvsis of TADG-12 Expression 

To examine transcript size and tissue distribution, the 
catalytic domain subclone was randomly labeled and used to 



38 



wo 00/52044 PCT/USOO/05612 

probe Northern blots representing normal ovarian tissue, ovarian 
tumors and the cancer cell lines SW626, CAOV3, HeLa, MD-MBA- 
435S and MD-MBA-231 (Figure 2). Three transcripts of 2.4, 1.6 
and 0.7 kilobases were observed. In blots of normal and ovary 
5 tumor the smallest transcript size 0.7 kb was lowly expressed in 
normal ovary while all transcripts (2.4, 1.6 and 0.7 kb) were 
abundantly present in serous carcinoma. In addition. Northern 
blots representing the normal human tissues spleen, thymus, 
prostate, testis, ovary, small intestine, colon and peripheral blood 

10 leukocyte, and normal human fetal tissues of brain, lung, liver and 
kidney were examined. The same three transcripts were found to 
be expressed weakly in all of these tissues (data not shown). A 
human p -tubulin specific probe was utilized as a control for 
relative sample loading. In addition, an RNA dot blot was probed 

15 representing 50 human tissues and determined that this clone is 
weakly expressed in all tissues represented (Figure 3). It was 
found most prominently in heart, with intermediate levels in 
putamen, amygdala, kidney, liver, small intestine, skeletal muscle, 
and adrenal gland. 

20 

EXAMPLE 11 

Sequencing and Characterization of TADG-12 

An ovarian tumor cDNA library constructed in X,ZAP 

was screened by standard hybridization techniques using the 
25 catalytic domain subclone as a probe. Two clones that overlapped 
with the probe were identified and sequenced and found to 
represent 2316 nucleotides. The 97 nucleotides at the 3* end of 
the transcript including the poly-adenylation signal and the poly 
(A) tail were identified by homology with clones available in 

39 



wo 00/52044 PCT/USOO/0561 2 

GenBank's EST database. This brought the total size of the 
transcript to 2413 bases (SEQ ID No. 1, Figure 4). Subsequent 
screening of GenBank*s Genomic Database revealed that TADG-12 
is homologous to a cosmid from chromosome 17. This cosmid has 
5 the accession number AC015555. 

The identified cDNA includes an open reading frame 
that would produce a predicted protein of 454 amino acids (SEQ ID 
No. 2), named Tumor Associated Differentially-Expressed Gene 1 2 
(TADG-12). The sequence has been submitted to the GenBank 

10 database and granted the accession # AF201380. Using homology 
alignment programs, this protein contains several domains 
including an amino-terminal cytoplasmic domain, a potential Type 
II transmembrane domain followed by a low-density lipoprotein 
receptor-like class A domain (LDLR-A), a scavenger receptor 

15 cysteine rich domain (SRCR), and an extracellular serine protease 
domain. 

As predicted by the '^^Pred program, TADG-12 contains 
a highly hydrophobic stretch of amino acids that could serve as a 
potential transmembrane domain, which v^^ould retain the amino 

20 terminus of the protein within the cytoplasm and expose the 
ligand binding domains and protease domain to the extracellular 
space. This general structure is consistent with other known 
transmembrane proteases including hepsin [17], and TMPRSS2 
[18], and TADG-12 is particularly similar in structure to the 

25 TMPRSS2 protease. 

The LDLR-A domain of TADG-12 is represented by the 
sequence from amino acid 74 to 108 (SEQ ID No. 13). The LDLR-A 
domain was originally identified within the LDL Receptor [19] as a 
series of repeated sequences of approximately 40 amino acids, 



40 



wo 00/52044 PCT/USOO/0561 2 

which contained 6 invariant cysteine residues and highly 
conserved aspartate and glutamate residues. Since that initial 
identification, a host of other genes have been identified which 
contain motifs homologous to this domain [20]. Several proteases 
5 have been identified which contain LDLR-A motifs including 
matriptase, TMPRSS2 and several complement components. A 
comparison of TADG-12 with other known LDLR-A domains is 
shown in Figure 5A. The similarity of these sequences range from 
44 to 54% of similar or identical amino acids. 

10 In addition to the LDLR-A domain, TADG-12 contains 

another extracellular ligand binding domain with homology to the 
group A SROl family. This family of protein domains typically is 
defined by the conservation of 6 cysteine resides within a 
sequence of approximately 100 amino acids [23]. The SRCR 

15 domain of TADG-12 is encoded by amino acids 109 to 206 (SEQ ID 
No. 17), and this domain was aligned with other SRCR domains and 
found to have between 36 and 43% similarity (Figure 5B). 
However, TADG-12 only has 4 of the 6 conserved cysteine 
residues. This is similar to the SRCR domain found in the protease 

20 TMPRSS2. 

The TADG-12 protein also includes a serine protease 
domain of the trypsin family of proteases. An alignment of the 
catalytic domain of TADG-12 with other known proteases is shown 
in Figure 5C. The similarity among these sequence ranges from 4 8 
25 to 55%, and TADG-12 is most similar to the serine protease 
TMPRSS2 which also contains a transmembrane domain, LDLR-A 
domain and an SRCR domain. There is a conserved amino acid 
motif (RIVGG) downstream from the SRCR domain that is a 
potential cleavage/activation site common to many serine 



41 



wo 00/52044 PCT/USOO/05612 

proteases of this family [25]. This suggests that TADG-12 is 
trafficked to the cell surface where the ligand binding domains are 
capable of interacting with extracellular molecules and the 
protease domain is potentially activated. TADG-12 also contains 
5 conserved cysteine residues (amino acids 208 and 243) which in 
other proteases form a disulfide bond capable of linking the 
activated protease to the other extracellular domains. 

EXAMPLE 12 

1 0 Quantitative PGR Characterization of the Alternative Transcript 

The original TADG-12 subclone was identified as 
highly expressed in the initial redundant-primer PGR experiment. 
The TADG-12 variant form (TADG-12V) with the insertion of 133 
bp was also easily detected in the initial experiment. To identify 

15 the frequency of this expression and whether or not the 
expression level between normal ovary and ovarian tumors was 
different, a previously authenticated semi-quantitative PGR 
technique was employed [16]. The PGR analysis co-amplified a 
product for [3 -tubulin with either a product specific to TADG-12 or 

20 TADG-12V in the presence of a radiolabelled nucleotide. The 
products were separated by agarose gel electrophoresis and a 
phosphoimager was used to quantitate the relative abundance of 
each PGR product. Examples of these PGR amplification products 
are shown for both TADG-12 and TADG-12V in Figure 6. Normal 

25 expression was defined as the mean ratio of TADG-12 (or TADG- 
12V) to p-tubulin +/- 2SD as examined in normal ovarian samples. 
For tumor samples, overexpression was defined as >2SD from the 
normal TADG- 1 2/p-tubulin or TADG-12V/p-tubulin ratio. The 
results are summarized in Table 1 and Table 2. TADG-12 was 



wo 00/52044 



PCT/USOO/05612 



found to be overexpressed in 41 of 55 carcinomas examined while 
the variant form was present at aberrantly high levels in 8 of 2 2 
carcinomas. As determined by the student's t test, these 
differences were statistically significant (p < 0.05). 

5 

TABLE 1 

Frequency of Overexpression of TADG-12 in Ovarian Carcinoma 



Histology Type 


TADG-12 (%) 


Normal 


0/16 (0%) 


LMP-Serous 


3/6 (50%) 


LMP-Mucinous 


0/4 (0%) 


Serous Carcinoma 


23/29 (79%) 


Mucinous Carcinoma 


7/12 (58%) 


Endometrioid Carcinoma 


8/8 (100%) 


Clear Cell Carcinoma 


3/6 (50%) 


Benign Tumors 


3/4 (75%) 



10 Overexpression =more than two standard deviations above 

the mean for normal ovary 
LMP = low malignant potential tumor 



43 



wo 00/52044 PCT/USOO/05612 

TABLE 2 



Frequency of Overexpression of TADG-12V in Ovarian Carcinoma 



nistoiogy lype 


lALlij-lZV yvc) 


IN ormai 


A/1 n ^no3L\ 


J— > 1 VI x~OClUUd 




LMP-Mucinous 


0/3 (0%) 


Serous Carcinoma 


4/14 (29%) 


Mucinous Carcinoma 


3/5 (60%) 


Endometrioid Carcinoma 


1/3 (33%) 


Clear Cell Carcinoma 


N/D 



Overexpression =more than two standard deviatipns above 
5 the mean for normal ovary; LMP = low malignant potential tumor 

EXAMPLE 13 

Immunohistochemical Analvsis of TADG-12 in Ovarian Tumor Cells 
10 In order to examine the TADG-12 protein, polyclonal 

rabbit anti-sera to a peptide located in the carboxy-terminal 
amino acid sequence was developed. These antibodies were used 
to examine the expression level of the TADG-12 protein and its 
localization within normal ovary and ovarian tumor cells b y 
15 immuno-localization. No staining was observed in normal ovarian 
tissues (Figure 7A) while significant staining was observed in 2 2 
of 29 tumors studied. Representative tumor samples are shown in 
Figures 7B and 7C. It should be noted that TADG-12 is found in a 
diffuse pattern throughout the cytoplasm indicative of a protein in 
20 a trafficking pathway. TADG-12 is also found at the cell surface in 
these tumor samples as expected. It should be noted that the 

44 



wo 00/52044 



PCT/US00/056n 



10 



antibody developed and used for immunohistochemical analysis 
would not detect the TADG-12V truncated protein. 

The results of the immunohistochemical staining are 
summarized in Table 3. 22 of 29 ovarian tumors showed positive 
staining of TADG-12, whereas normal ovarian surface epithelium 
showed no expression of the TADG-12 antigen. 8 of 10 serous 
adenocarcinomas, 8 of 8 mucinous adenocarcinomas, 1 of 2 clear 
cell carcinomas, and 4 of 6 endometroid carcinomas showed 
positive staining. 



TABLE 3 



Case 


Stage 


Histology 


Grade 


LN' 


TADG12 


Prognosis 


1 




Normal ovary 






0- 




2 




Normal ovary 






0- 




3 




Normal ovary 






0- 




4 




Mucinous B 




ND 


0- 


Alive 


5 




Mucinous B 




ND 


1+ 


Alive 


6 


1 a 


Serous LMP 


Gl 


ND 


1+ 


Alive 


7 


1 a 


Mucinous LMP 


Gl 


ND 


1+ 


Alive 


8 


1 a 


Mucinous CA 


Gl 


ND 


1+ 


Alive 


9 


1 a 


Mucinous CA 


G2 


ND 


1+ 


Alive 


1 0 


1 a 


Endometrioid CA 


Gl 


ND 


0- 


Alive 


1 1 


1 c 


Serous CA 


Gl 


N 


1+ 


Alive 


1 2 


1 c 


Mucinous CA 


Gl 


N 


1+ 


Alive 


1 3 


1 c 


Mucinous CA 


Gl 


N 


2+ 


Alive 


1 4 


1 c 


Clear cell CA 


G2 


N 


0- 


Alive 


1 5 


1 c 


Clear cell CA 


G2 


N 


0- 


Alive 


1 6 


2c 


Serous CA 


G3 


N 


2+ 


Alive 


1 7 


3a 


Mucinous CA 


G2 


N 


2+ 


Alive 



45 



wo 00/52044 PCT/USOO/05612 



1 8 


3b 


Serous CA 




Gl 


ND 


1+ 


Alive 


1 9 


3c 


Serous CA 




Gl 


N 


0- 


Dead 


20 


3c 


Serous CA 




G3 


P 


1 + 


Alive 


2 1 


3c 


Serous CA 




G2 


P 


2+ 


Alive 


22 


3c 


Serous CA 




Gl 


P 


2+ 


Unknown 


23 


3c 


Serous CA 




G3 


ND 


2+ 


Alive 


24 


3c 


Serous CA 




G2 


N 


0- 


Dead 


25 


3c 


Mucinous CA 




Gl 


P 


2+ 


Dead 


26 


3c 


Mucinous CA 




G2 


ND 


1+ 


Unknown 


27 


3c 


Mucinous CA 




G2 


N 


1+ 


Alive 


28 


3c 


Endometrioid 


CA 


Gl 


P 


1 + 


Dead 


29 


3c 


Endometrioid 


CA 


G2 


N 


0- 


Alive 


30 


3c 


Endometrioid 


CA 


G2 


P 


1+ 


Dead 


3 1 


3c 


Endometrioid 


CA 


G3 


P 


1+ 


Alive 


32 


3c 


Clear Cell CA 




G3 


P 


2+ 


Dead 



LN*= Lymph Node: B = Benign; N = Negative; P = Positive; 



ND = Not Done 



5 EXAMPLE 14 

Peptide Ranking 

For vaccine or immune stimulation, individual 9-mers 
to 11-mers of the TADG-12 protein were examined to rank the 
binding of individual peptides to the top 8 haplotypes in the 
10 general population [Parker et al., (1994)]. The computer program 
used for this analysis can be found at <http://www- 
bimas.dcrt.nih.gov/molbio/hla_bind/>. Table 4 shows the peptide 
ranking based upon the predicted half-life of each peptide's 
binding to a particular HLA allele. A larger half-life indicates a 



46 



wo 00/52044 



PCT/USOO/05612 



Stronger association with that peptide and the particular HLA 
molecule. The TADG-12 peptides that strongly bind to an HLA 
allele are putative immunogens, and are used to innoculate an 
individual against TADG-12. 



TABLE 4 





TADG-12 peptide 


ranking 










HLA Type 






Predicted 


SEC 




& Ranking 


Start 


Peptide 


Dissociationi 


ID 


10 


HLA A0201 












1 


40 


ILSLLPFEV 


685.783 


35 




2 


144 


AQLGFPSYV 


545.316 


36 




3 


225 


LLSQWPWQA 


63.342 .' 


37 




4 


252 


WIITAAHCV 


43.992 


38 


15 


5 


356 


VLMHAAVPL 


36.316 


39 




6 


176 


LLPDDKVTA 


34.627 


40 




7 


1 3 


FSFRSLFGL 


31.661 


41 




8 


1 5 1 


YVSSDNLRV 


27.995 


42 




9 


436 


RVTSFLDWI 


21.502 


43 


20 


1 0 


234 


SLQFQGYHL 


21.362 


44 




1 1 


181 


KVTALHHSV 


21.300 


45 




1 2 


183 


TALHHSVYV 


19.658 


46 




1 3 


41 1 


RLWKLVGAT 


18.494 


47 




1 4 


60 


LILALAIGL 


18.476 


48 


25 


1 5 


227 


SQWPWQASL 


17.977 


49 




1 6 


301 


RLGNDIAIiM 


1 1.426 


50 




1 7 


307 


ALMKLAGPL 


10.275 


51 




1 8 


262 


DLYLPKSWT 


9.837 


52 




1 9 


4 1 6 


LVGATSFGI 


9.001 


53 


30 


20 


54 


SLGIIATpTTp 


8.759 


54 



47 



wo 00/52044 



PCT/USOO/05612 



HLA A0205 



1 218 

2 60 

3 35 

5 4 307 

5 27 1 

6 397 

7 227 

8 270 
10 9 5 6 

10 110 

11 181 

12 151 
1 3 356 

15 14 144 

15 13 

16 5 4 

1 7 234 

18 217 

20 19 411 

20 252 
HLA Al 

1 130 

2 8 
25 3 328 

4 3 

5 98 

6 346 

7 360 



IVGGNMSLL 47.600 55 

LILALAIGL 35.700 48 

AVAAQILSL 28.000 56 

ALMKLAGPL 21.000 51 

IQVGLVSLL 19.040 57 

CQGDSGGPL 16.800 58 

SQWPWQASL 16.800 49 

TIQVGLVSL 14.000 59 

GIIALILAL 14.000 60 

RVGGQNAVL 14.000 61 

KVTALHHSV 12.000 45 

YVSSDNLRV 12.000 42 

VLNHAAVPL 11.900 39 

AQLGFPSYV 9.600 36 

FSFRSLFGL 7.560 41 

SLGIIALIL 7.000 54 

SLQFQGYHL 7.000 44 

RIVGGNMSL 7.000 62 

RLWKLVGAT 6.000 47 

WIITAAHCV 6.000 38 

CSDDWKGHY 37.500 63 

AVEAPFSFR 9.000 64 

NSEENFPDG 2.700 65 

ENDPPAVEA 2.500 66 

DCKDGEDEY 2.500 67 

ATEDGGDAS 2.250 68 

AAVPLISNK 2.000 69 



48 



wo 00/52044 



PCT/USOO/05612 



8 153 

9 182 

10 143 
1 1 259 

5 12 369 

1 3 278 

1 4 426 

15 32 

1 6 406 

10 17 329 

1 8 303 

19 1 27 

20 440 
HLA A24 

15 1 433 

2 263 

3 169 

4 217 

5 296 
20 6 1 6 

7 267 

8 8 1 

9 375 

10 110 
25 11 189 

.1 2 6 0 

13 165 

14 271 

15 56 



SSDNIiRVSS 1.500 70 

VTALHHSVY 1.250 71 

CAQLGFPSY 1.000 72 

CVYDLYLPK 1.000 73 

ICNHRDVYG 1.000 74 

LLDNPAPSH 1.000 75 

CAEVNKPGV 1.000 76 

DADAVAAQI 1.000 77 

VCQERRLWK 1.000 78 

SEENFPDGK 0.900 79 

GNDIALMKL 0.625 80 

KTMCSDDWK 0.500 81 

FLDWIHEQM 0.500 82 

VYTRVTSFL 280.000 83 

LYLPKSWTI 90.000 84 

EFVSIDHLL 42.000 85 

RIVGGNMSL 12.000 62 

KYKPKRLGN 12.000 86 

RSLFGLDDL 12.000 87 

KSWTIQVGL 11.200 88 

RSSFKCIEL 8.800 89 

VYGGIISPS 8.000 90 

RVGGQNAVL 8.000 91 

VYVREGCAS 7.500 92 

LILALAIGL 7.200 48 

QFREEFVSI 7.200 93 

IQVGLVSLL 7.200 57 

GIIALILAL 7.200 60 



49 



wo 00/52044 



PCT/USOO/05612 



1 6 
1 7 

1 8 
1 9 

5 20 
HLA B7 
1 
2 
3 

10 4 

5 
6 
7 
8 

15 9 

1 0 
1 1 
1 2 
1 3 

20 1 4 

1 5 
1 6 
1 7 
1 8 

25 1 9 

20 
HLA B8 
1 
2 



1 0 

307 

407 

356 

381 



375 

38 1 

362 

35 

373 

307 

283 

1 77 

47 

1 10 

218 

36 

255 

1 0 

138 

1 95 

215 

298 

3 1 3 

108 



EAPFSFRSL 



AliMKLAGPL 



CQERRLWKL 



VIjNHAAVPL 



SPSMLCAGY 



VYGGIISPS 



SPSMLCAGY 



VPLISNKIC 



AVAAQILSL 
RDVYGGIIS 



ALMKLAGPL 



APSHLVEKI 



LPDDKVTAL 



EVFSQSSSL 
RVGGQNAVL 



IVGGNMSLL 



VAAQILSLL 
TAAHCVYDL 



EAPFSFRSL 



YANVACAQL 
CASGHWTL 



SSRIVGGNM 



KPKRLGNDI 



GPLTFNEMI 



CVRVGGQNA 



7.200 
7.200 
6.600 
6.000 
6.000 



200.000 

80.000 

80.000 

60.000 

40.000 

36.000 

24.000 

24.000 

20.000 

20.000 

20.000 

12.000 

12.000 

12.000 

12.000 

12.000 

10.00 

8.000 

8.000 

5.000 



94 
51 
95 
39 
96 



97 
98 
99 
56 
100 
51 
101 
102 
103 
91 
55 
104 
105 
94 
106 
107 
108 
109 
1 10 
1 1 1 



294 
373 



HSKYKPKRL 



RDVYGGIIS 



80.000 
16.000 



1 12 
100 



50 



wo 00/52044 
3 

4 
5 
6 

5 7 

8 
9 

1 0 
1 1 

10 1 2 

1 3 
1 4 
1 5 
1 6 

15 1 7 

1 8 
1 9 
20 

HLA B2702 
20 1 

2 
3 
4 
5 

25 6 

7 
8 
9 

1 0 




177 

265 

88 

298 

8 1 

375 

79 

1 0 

215 

36 

255 

381 

195 

362 

138 

207 

154 

47 



300 

435 

376 

410 

210 

227 

109 

191 

7 8 

1 1 3 



LPDDKVTAIi 



LPKSVJTIQV 



ElilTRCDGV 



KPKRLGNDI 



RSSFKCIEL 



VYGGIISPS 



RCRSSFKCI 



EAPFSFRSL 



SSRIVGGNM 



VAAQILSLL 



TAAHCVYDL 



SPSMLCAGY 



CASGHWTL 



VPLISNKIC 



YANVACAQL 
ACGHRRGYS 



SDNLRVSSL 



EVFSQSSSL 



KRLGNDIAL 



TRVTSFLDW 



YGGIISPSM 



RRLWKLVGA 



HRRGYSSRI 



SQWPWQASL 
VRVGGQNAV 



VREGCASGH 



YRCRSSFKC 



GQNAVLQVF 



4.800 
2.400 
2.400 
2.000 
2.000 
2.000 
2.000 
1.600 
1.000 
0.800 
0.800 
0.800 
0.800 
0.800 
0.800 
0.400 
0.400 
0.400 



PCT/US00/0S612 

102 

1 13 

114 

109 

89 

97 

115 

94 

108 

1 04 

116 

98 

107 

99 

106 

1 17 

1 1 8 

103 



180.000 1 19 
100.000 120 
100.000 121 



60.000 
60.000 
30.000 
20.000 
20.000 
20.000 
20.000 



122 

123 

49 

124 

125 

126 

127 



51 



wo 00/52044 



PCTAJS0O/OS612 



1 1 
1 2 
1 3 

1 4 

5 1 5 

1 6 
1 7 
1 8 
1 9 

10 20 

HLA B4403 
1 
2 
3 

15 4 

5 
6 
7 
8 

20 9 

1 0 
1 1 
1 2 
1 3 

25 1 4 

1 5 
1 6 
1 7 
1 8 



9 1 

38 

21 1 

216 

1 1 8 

370 

393 

235 

271 

408 



427 
1 62 
9 

3 1 8 

256 

98 

46 

3 8 

64 

192 

330 

182 

408 

206 

5 

261 
33 
1 68 



TRCDGVSDC 



AQILSLLPF 



RRGYSSRIV 



SRIVGGNMS 



LQVFTAASW 



CNHRDVYGG 



GVDSCQGDS 
LQFQGYHLC 
IQVGLVSLL 
CQERRLWKL 



AEVNKPGVY 



LEGQFREEF 



VEAPFSFRS 



NEMIQPVCL 
AAHCVYDLY 



DCKDGEDEY 



FEVFSQSSS 
AQILSLLPF 



LAIGLGIHF 



REGCASGHV 



EENFPDGKV 



VTALHHSVY 



QERRLWKLV 



TACGHRRGY 



DPPAVEAPF 



YDLYLPKSW 



ADAVAAQIL 



EEFVSIDHL 



20.000 

20.000 

18.000 

10.000 

10.000 

10.000 

10.000 

10.000 

6.000 

6.000 



90.000 

40.000 

24.000 

12.000 

9.000 

9.000 

8.000 

7.500 

7.500 

6.000 

6.000 

6.000 

6.000 

4.500 

4.500 

4.500 

4.500 

4.000 



128 

129 

130 

131 

132 

133 

134 

135 

57 

95 



136 

1 37 

138 

1 39 

140 

67 

141 

129 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 



52 




wo 00/52044 PCT/USOO/0561 2 



19 304 NDIALMKLA 3.750 152 

2 0 104 DEYRCVRVG 3.600 153 



5 Conclusion 

In this study, a serine protease was identified by 
means of a PGR based strategy. By Northern blot, the largest 
transcript for this gene is approximately 2.4 kb, and it is found to 
be expressed at high levels in ovarian tumors while found at 

10 minimal levels in all other tissues examined. The full-length cDNA 
encoding a novel multi-domain, cell-surface serine protease was 
cloned, named TADG-12. The 454 amino acid protein contains a 
cytoplasmic domain, a type II transmembrane domain, an LDLR-A 
domain, an SRCR domain and a serine protease domain. Using a 

15 semi-quantitative PGR analysis, it was shown that TADG-12 is 
overexpressed in a majority of tumors studied. 
Immunohistochemical staining corroborates that in some cases 
this protein is localized to the cell-surface of tumor cells and this 
suggests that TADG-12 has some extracellular proteolytic 

20 functions. Interestingly, TADG-12 also has a variant splicing form 
that is present in 35% of the tumors studied. This variant mRNA 
would lead to a truncated protein that may provide a unique 
peptide sequence on the surface of tumor cells. 

This protein contains two extracellular domains which 

25 might confer unusual properties to this multidomain molecule. 
Although the precise role of LDLR-A function with regard to 
proteases remains unclear, this domain certainly has the capacity 
to bind calcium and other positively charged ligands [21,22]. This 
may play an important role in the regulation of the protease or 



53 



wo 00/52044 PCT/USOO/05612 

subsequent internalization of the molecule. The SRC31 domain was 
originally identified within the macrophage scavenger receptor 
and functionally described to bind lipoproteins. Not only are SRCR 
domains capable of binding lipoproteins, but they may also bind to 
5 molecules as diverse as polynucleotides [23]. More recent studies 
have identified members of this domain family in proteins with 
functions that vary from proteases to cell adhesion molecules 
involved in maturation of the immune system [24]. In addition, 
TADG-12, like TMPRSS2 has only four of six cysteine residues 
10 conserved within its SRCR domain. This difference may allow for 
different structural features of these domains that confer unusual 
ligand binding properties. At this time, only the function of the 
CD6 encoded SRCR is well documented. In the case of CD6, the 
SRCR domain binds to the cell adhesion molecule ALCAM [23]. 
15 This mediation of cell adhesion is a useful starting point for future 
research on newly identified SRCR domains, however, the 
possibility of multiple functions for this domain can not be 
overlooked. SRCR domains are certainly capable of cell adhesion 
type interactions, but their capacity to bind other types of ligands 
20 should be considered. 

At this time, the precise role of TADG-12 remains 
unclear. Substrates have not been identified for the protease 
domain, nor have ligands been identified for the extracellular 
LDLR-A and SRCR domains. Figure 8 presents a working model of 
25 TADG-12 with the information disclosed in the present invention. 
Two transcripts are produced which lead to the production of 
either TADG-12 or the truncated TADG-12V proteins. Either of 
these proteins is potentially targeted to the cell surface. TADG-12 
is capable of becoming an activated serine protease while TADG- 



54 



wo 00/52044 PCT/USOO/05612 

12V is a truncated protein product that if at the cell surface may 
represent a tumor specific epitope. 

The problem with treatment of ovarian cancer today 
remains the inability to diagnose the disease at an early stage. 
5 Identifying genes that are expressed early in the disease process 
such as proteases that are essential for tumor cell growth [26] is 
an important step toward improving treatment. With this 
knowledge, it may be possible to design assays to detect the 
highly expressed genes such as the TADG-12 protease described 

10 here or previously described proteases to diagnose these cancers 
at an earlier stage. Panels of markers may also provide prognostic 
information and could lead to therapeutic strategies for individual 
patients. Alternatively, inhibition of enzymes such as proteases 
may be an effective means for slowing progression of ovarian 

15 cancer and improving the quality of patient life. Other features of 
TADG-12 and TADG-12V must be considered important to future 
research too. The extracellular ligand binding domains are natural 
targets for drug delivery systems. The aberrant peptide 
associated with the TADG-12V protein may provide an excellent 

20 target drug delivery or for immune stimulation. 

The following references were cited herein. 

1. Duffy, M.J., Clin. Exp. Metastasis, 10: 145-155, 1992. 

2. Monsky, W.L., et al., Cancer Res., 53: 3159-3164, 1993. 

3. Powell, W.C., et al., Cancer Res., 53: 417-422, 1993. 

25 4, Neurath, H. The Diversity of Proteolytic Enzymes. In: RJ. 

Beynon and J.S. Bond (eds.), pp. 1- 13, Proteolytic enzymes, 
Oxford: IRL Press, 1989. 
5. Liotta, L.A., et al.. Cell, 64: 327-336, 1991. 



55 



wo 00/52044 PCT/USOO/05612 

6. Tryggvason, K., et al., Biochem. Biophys. Acta., 907: 191-217, 
1987. 

7. McCormack, R.T., et al., Urology, 45:729-744, 1995. 

8. Landis, S.H., et al., CA Cancer J. Clin., 48: 6-29, 1998. 

5 9. Tanimoto, H., et al.. Cancer Res., 57: 2884-2887, 1997. 

10. Tanimoto, H., et al.. Cancer, 86: 2074-2082, 1999. 

11. Underwood, L.J., et al.. Cancer Res., 59:4435-4439, 1999. 

12. Tanimoto, et al.. Increased Expression of Protease M in Ovarian 
Tumors. Tumor Biology, In Press, 2000. 

10 13. Tanimoto, H., et al., Proc. Of the Amer. Assoc. for Cane. Research 

39:648. 1998. 

14. Tanimoto, H., et al., Tumor Biology, 20: 88-98, 1999. 

15. Maniatis, T., Fritsch, E.F. & Sambrook, J. Molecular Cloning, p. 
309-361 Cold Spring Harbor Laboratory, New York, 1982. 

15 16. Shigemasa, K., et al., J. Soc. Gynecol. Invest., ^:95-102, 1997. 

17. Leytus, S.P., et al.. Biochemistry, 27; 1067-1074, 1988. 

18. Paoloni-Giacobino, A., et al.. Genomics, 44: 309-320, 1997. 

19. Sudhof, T.C., et al., Science, 228: 815-822, 1985. 

20. Daly, N., et al., Proc. Natl. Acad. Sci. USA 92: 6334-6338, 1995. 
20 21. Mahley, R.W., Science 240: 622-630, 1988. 

22. Van Driel, I.R., et al., J. Biol. Chem. 262: 17443-17449, 1987. 

23. Freeman, M., et al., Proc. Natl. Acad. Sci. USA 87: 8810-8814, 
1990. 

24. Aruffo, A., et al.. Immunology Today 18(10): 498-504, 1997. 
25 25. Rawlings, N.D., and Barrett, A.J., Methods Bnzymology 244: 19- 

61, 1994. 

26. Torres-Rosado, A., et al., Proc. Natl. Acad. Sci. USA, 90: 7181- 
7185, 1993. 



56 



wo 00/52044 PCTAJSOO/05612 

Any patents or publications mentioned in this 
specification are indicative of the levels of those skilled in the art 
to which the invention pertains. These patents and publications 
are herein incorporated by reference to the same extent as if each 
5 individual publication was specifically and individually indicated 
to be incorporated by reference. 

One skilled in the art will readily appreciate that the 
present invention is well adapted to carry out the objects and 
obtain the ends and advantages mentioned, as well as those 

10 inherent therein. The present examples along with the methods, 
procedures, treatments, molecules, and specific compounds 
described herein are presently representative of preferred 
embodiments, are exemplary, and are not intended as limitations 
on the scope of the invention. Changes therein and other uses will 

15 occur to those skilled in the art which are encompassed within the 
spirit of the invention as defined by the scope of the claims. 



57 



wo 00/52044 



PCT/USOO/05612 



WHAT IS CLAIMED IS: 

1. A DNA fragment encoding Tumor Associated 
Differentially-Expressed Gene-12 (TADG-12) protein selected from 

5 the group consisting of; 

(a) an isolated DNA fragment which encodes a 
TADG-12 protein; 

(b) an isolated DNA fragment which hybridizes to 
isolated DNA fragment of (a) above and which encodes a TADG-12 

10 protein; and 

(c) an isolated DNA fragment differing from the 
isolated DNA fragments of (a) and (b) above in codon sequence 
due to the degeneracy of the genetic code, and which encodes a 
TADG-12 protein. 

15 

2. The DNA fragment of claim 1, wherein said DNA 
fragment has the sequence selected from the group consisting of 
SEQ ID No. 1 and SEQ ID No, 3. 

20 3. The DNA fragment of claim 1, wherein said 

TADG-12 protein has the amino acid sequence selected from the 
group consisting of SEQ ID No. 2 and SEQ ID No. 4. 

4. A vector comprising the DNA fragment of claim 1 
25 and regulatory elements necessary for expression of the DNA in a 

cell. 

5. The vector of claim 4, wherein said DNA 
fragment encodes a TADG-12 protein having the amino acid 



58 



wo 00/52044 



PCT/USOO/05612 



sequence selected from the group consisting of SEQ ID No. 2 and 
SEQ ID No. 4. 

6. A host cell transfected with the vector of claim 4, 
5 said vector expressing a TADG-12 protein. 

7. The host cell of claim 6, wherein said cell is 
selected from the group consisting of a bacterial cell, a mammalian 
cell, a plant cell and an insect cell. 

10 

8. The host cell of claim 7, wherein said bacterial 
cell is E. coll. 

9. An antisense oligonucleotide directed against the 
1 5 DN A fragment of claim 1 . 

10. An isolated and purified TADG-12 protein coded 
for by DNA selected from the group consisting of: 

(a) isolated DNA which encodes a TADG-12 protein; 
20 (b) isolated DNA which hybridizes to isolated DNA of 

(a) above and which encodes a TADG-12 protein; and 

(c) isolated DNA differing from the isolated DNAs of 
(a) and (b) above in codon sequence due to the degeneracy of the 
genetic code, and which encodes a TADG-12 protein. 

25 

11. The isolated and purified TADG-12 protein of 
claim 10, wherein said TADG-12 protein has an amino acid 
sequence selected from the group consisting of SEQ ID No. 2 and 
SEQ ID No. 4. 



59 




wo 00/52044 PCT/USOO/0561 2 



12. A method for detecting expression of the TADG- 
12 protein of claim 10, comprising the steps of: 

(a) contacting mRNA obtained from a cell with a 
labeled hybridization probe; and 

(b) detecting hybridization of the probe with the 

mRNA. 



13. An antibody directed against the TADG-12 
protein of claim 10. 

10 

14. A method for diagnosing a cancer in an 
individual, comprising the steps of: 

(a) obtaining a biological sample from said 
individual; and 

15 (b) detecting a TADG-12 protein in said sample, 

wherein the presence of a TADG-12 protein in said sample is 
indicative of the presence of a cancer in said individual, wherein 
the absence of a TADG-12 protein in said sample is indicative of 
the absence of a cancer in said individual. 

20 

15. The method of claim 14, wherein said biological 
sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 

25 

16. The method of claim 14, wherein said detection 
of a TADG-12 protein is by means selected from the group 
consisting of Northern blot. Western blot, PGR, dot blot, ELIZA 



60 



wo 00/52044 PCT/USOO/05612 

sandwich assay, radioimmunoassay, DNA array chips and flow 
cytometry. 



17. The method of claim 14, wherein said cancer is 
5 selected from the group consisting of ovarian cancer, breast 

cancer, lung cancer, colon cancer, prostate cancer and other 
cancers in which TADG-12 is overexpressed. 

18. A method for detecting malignant hyperplasia in 
10 a biological sample, comprising the steps of: 

(a) isolating mRNA from said sample; and 

(b) detecting TADG-12 mRNA in said sample, 

* 

wherein the presence of said TADG-12 mRNA in said sample is 
indicative of the presence of malignant hyperplasia, wherein the 
15 absence of said TADG-12 mRNA in said sample is indicative of the 
absence of malignant hyperplasia. 

19. The method of claim 18, further comprising the 
step of comparing said TADG-12 mRNA to reference information, 

20 wherein said comparison provides a diagnosis of said malignant 
hyperplasia. 

20. The method of claim 18, further comprising the 
step of comparing said TADG-12 mRNA to reference information, 

25 wherein said comparison determines a treatment of said 
malignant hyperplasia. 

21. The method of claim 18, wherein said detection 
of TADG-12 mRNA is by PGR amplification. 



61 




wo 00/52044 PCT/USOO/05612 

22, The method of claim 21, wherein said PGR 

amplification uses primers selected from the group consisting of 
SEQ ID Nos. 28-31. 



5 23, The method of claim 18, wherein said biological 

sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 



10 24. A method for detecting malignant hyperplasia in 

a biological sample, comprising the steps of: 

(a) isolating protein from said sample; and 

(b) detecting a TADG-12 protein in said sample, 
wherein the presence of a TADG-12 protein in said sample is 

15 indicative of the presence of malignant hyperplasia, wherein the 
absence of a TADG-12 protein in said sample is indicative of the 
absence of malignant hyperplasia. 

25. The method of claim 24, further comprising the 
20 step of comparing said TADG-12 protein to reference information, 

wherein said comparison provides a diagnosis of said malignant 
hyperplasia. 

26. The method of claim 24, further comprising the 
25 step of comparing said TADG-12 protein to reference information, 

wherein said comparison determines a treatment of said 
malignant hyperplasia. 



62 



wo 00/52044 PCT/USOO/05612 

27. The method of claim 24, wherein said detection 
is by immunoaffinity to an antibody, wherein said antibody is 
directed against a TADG-12 protein. 

5 28. The method of claim 24, wherein said biological 

sample is selected from the group consisting of blood, urine, saliva, 
tears, interstitial fluid, ascites fluid, tumor tissue biopsy and 
circulating tumor cells. 

10 29. A method of inhibiting expression of endogenous 

TADG-12 mRNA in a cell, comprising the step of: 

introducing a vector into a cell, wherein said vector 
comprises a DNA fragment of TADG-12 in opposite orientation 
operably linked to elements necessary for expression, wherein 

15 expression of said vector in said cell produces TADG-12 antisense 
mRNA, wherein said TADG-12 antisense mRNA hybridizes to 
endogenous TADG-12 mRNA, thereby inhibiting expression of 
endogenous TADG-12 mRNA in said cell. 

20 30. A method of inhibiting expression of a TADG-12 

protein in a cell, comprising the step of: 

introducing an antibody into a cell, wherein said 
antibody is directed against a TADG-12 protein or fragment 
thereof, wherein binding of said antibody to said TADG-12 protein 
25 or fragment thereof inhibits expression of said TADG-12 protein. 

31. A method of targeted therapy to an individual, 
comprising the step of: 



63 



wo 00/52044 PCT/USOO/056 1 2 

administering a compound to an individual, wherein 
said compound has a targeting moiety and a therapeutic moiety, 
wherein said targeting moiety is specific for a TADG-12 protein. 

5 32. The method of claim 31, wherein said targeting 

moiety is selected from the group consisting of an antibody 
directed against a TADG-12 protein and a ligand or ligand binding 
domain that binds a TADG-12 protein. 

10 33. The method of claim 32, wherein said TADG-12 

protein has an amino acid sequence selected from the group 
consisting of SEQ ID No. 2 and SEQ ID No. 4. 

34. The method of claim 31, wherein said 
15 therapeutic moiety is selected from the group consisting of a 

radioisotope, a toxin, a chemotherapeutic agent, an immune 
stimulant and a cytotoxic agent. 

35. The method of claim 31, wherein said individual 
20 suffers from a disease selected from the group consisting of 

ovarian cancer, lung cancer, prostate cancer, colon cancer and 
other cancers in which TADG-12 is overexpressed. 

3 6. A method of vaccinating an individual against 
25 TADG-12, comprising the step of inoculating the individual with a 
TADG-12 protein or fragment thereof, wherein said TADG-12 
protein or fragment thereof lacks TADG-12 activity, wherein said 
inoculation with said TADG-12 protein or fragment thereof elicits 



64 



wo 00/52044 PCTAJSOO/05612 

an immune response in said individual, thereby vaccinating said 
individual against TADG-12. 



37, The method of claim 36, wherein said individual 
5 has a cancer, is suspected of having a cancer or is at risk of getting 
a cancer. 

3 8. The method of claim 36, wherein said TADG-12 
protein has an amino acid sequence selected from the group 
consisting of SEQ ID No. 2 and SEQ ID No. 4. 

10 

39. The method of claim 36, wherein said TADG-12 
fragment has a sequence shown in SEQ ID No. 8. 

40. The method of claim 36, wherein said TADG-12 
15 fragment is a 9-residue fragment selected from the group 

consisting of SEQ ID Nos. 35, 36, 55, 56, 83, 84, 97, 98, 119, 120, 
122, 123 and 136. 

41. An immunogenic composition, comprising an 
20 immunogenic fragment of a TADG-12 protein and an appropriate 

adjuvant. 

42. The immunogenic composition of claim 41, 
wherein said immunogenic fragment of a TADG-12 protein has a 
sequence shown in SEQ ID No. 8. 

25 

43. The immunogenic composition of claim 41, 
wherein said immunogenic fragment of a TADG-12 protein is a 9 - 
residue fragment selected from the group consisting of SEQ ID Nos. 
35, 36, 55, 56, 83, 84, 97, 98, 119, 120, 122, 123 and 136. 



65 



wo 00/52044 



PCTAJS00/056I2 



Unexpected 

Expected 
180 bp 

Primer 
Dimers 




FIG. 1A 



TADG12 



I 



1 TGGGTGGTGACGGCGGCGCACTGTGTTTATGACTTGTACCTCCCCAAGTCATGGACCATC 
W V V T A A (jT) CVYDLYLPKSWTI 

61 CAGGTGGGTCTAGTTTCCCTGTTGGACAATCCAGCCCCATCCCACTTGGTGGAGAAGATT 
QVG LV S LLDN PA P S H LVE K I 

( SEQ XD NO . 5 ) 

121 GTCTACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAACGACATCGCCCTCCTA 

VY H S KY K PKRL GN (d) I A L L 

(SEQ ID NO. 6 ) 



TADG12-V 



1 GGGTGGTGACGGCGGCGCACTGTGTTTATG AGATTGTAGCTCCTAGAGAAAGGGCAGACA 
VVTAAHCVYE IVAPRERADR 

61 GAAGAGGAAGGAAGCTCCTGTGCTGGAGGAAACCCACAAAAATGAAAGGACCTAGACCTT 
RGRKLIiCWRKPTKMKGPRPS 

121 CCCATAGCTAATTCCAGTGGACCATGTTATGGCAGATACAGG C T TGTACC TCCCCAAGTC 
^ g Z (SEQ ID NO. 8 ) 

181 ATGGACCATCCAGGTGGGTCTAGTTTCCCTGTTGGACAATCCAGCCCCATCCCACTTGGT 
241 GGAGAAGATTGTCTACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAACGACATCGCCCT 
301 CCTAATCACTAGTGCGGCCGCCTGCAGG (SEQ ID NO. 7) 

FIG. 1B 



1/9 




wo 00/52044 




PCT/US00/056I2 



Cell Lines Tissues 




FIG. 2 



2/9 



wo 00/52044 



PCT/US00/0S612 



8 



B 



H 




whole 
brain 


amydala 


caudate 
nucleus 


cere- 
bellum 


cerebral 
cortex 


frontal 
lobe 


hippo - 
campus 


medulla 
oblongata 


occipital 
lobe 


putamen 


subst. 
nigra 


temporal 
lobe 


thalamus 


sub - 
thalamic 
nucleus 


spinal 
cord 




heart' 


aorta 


skeletal 
muscle 


colon 


bladder 


uterus 


prostate 


stomach 


testis 


ovary 


pancreas 


pituitary 
gland 


adrenal 
gland 


thyroid 
gland 


salivary 
gland 


mammary 
gland 


kidney 


liver 


small 
intestine 


spleen 


thymus 


peripheral 
leukocyte 


lymph 
node 


bone 
marrow 


appendix 


lung 


trachea 


placenta 










fetal 
brain 


fetal 
heart 


fetal 
kidney 


fetal 
liver 


fetal 
spleen 


fetal 
thymus 


fetal 
lung 




yeast 
total RNA 
100 ng 


yeast 
tRNA 
100 ng 


Exoli 
rRNA 
100 ng 


E.coli 
DNA 
100 ng 


Poly r(A) 
100 ng 


human 
Cbtl DNA 
100 ng 


human 
DNA 
100 ng 


human 
DNA 
500 ng 



B 



D 



E 



F 



G 



H 



FIG. 3 



3/9 



wo 00/52044 



PCT/USOO/05612 



73 



1 CGGGAAAGGGCTGTGTTTATGGGAAGCCAGTAACACTGTGGCCTACTATCTCTTCCGTGG 
61 TGCCATCTACATTTTTGGGACTCGGGAATTATGAGGTAGAGGTGGAGGCGGAGCCGGATG 
121 TCAGAGGTCCTGAAATAGTCACCATGGGGGAAAATGATCCGCCTGCTGTTGAAGCCCCCT 

M G ENDPPAVEAPF13 
181 TCTCATTCCGATCGCTTTTTGGCCTTGATGATTTGAAAATAAGTCCTGTTGCACCAGATG 

SFRSLFGLCDLKISPVAPDA33 
241 CAGATGCTGTTGCTGCACAGATCCTGTCACTGCTGCCATTTGAAGTTTTTTCCCAATCAT 

DAVAAQILSLLPFEVF.S PO P S 53 
301 rr;TCATTGGGGATCATT GCATTGATATTAGCACTGGCCATTGGTCTG GGCATCCACTTCG 
I s L G IIALILALAIGlI g I H F D 

3 6 1 actgctcagggaagt acagatgtcgctcatcctttaagtgtatcgagctgataactcgat 

c sgkyrcrssfkci elit R C 93 
421 gtgacggagtctcggattgcaaagacggggaggacgagtaccgctgtgtccgggtgggtg 

DGVSDCKDGEDEY R . . C V__R V G G 113 

4 8 1 gtcagaatgccgtgctccaggtgttcacagctgcttcgtggaagaccatgtgctccgatg 

onavlqvf taaswktmcsdd 133 

5 41"XCTGGAAGGGTCACTACGCAAATGTTGCCTGTGCCCAACTGGGTTTCCCAAG 

WKGHYANVACAQL G F P S Y V S 15 3 

60i'"gTtcagat^^^^ 

sdnlrvsslegqfree F V S I 17 3 

661 "TCG AT CAC cTcTf CAG ATG ACAAGGT G ACT GCAT TAG ACC ACT C AGT AT AT GTG AG GG 

DHLLPDDKVTALHHSV Y R E 193 

7 2 1 ""AGGG AT GT G 

GCASGHVVTLQCTACGHRRG 213 
7 81 GCYaC AGC'^^^^^^^ 

Y S S ^ IVGGNMSLLSQWPWQA 233 
841 CCAGCCTTCAGTTCCAGGGCTACCACCTGTGCGGGGGCTCTGTCATCACGCCCCTGTGGA 

SliQFQGYHLCGGSVITPLWI 253 
901 TCATCACTGCTGCACACTGTGTTTATGACTTGTACCTCCCCAAGTCATGGACCATCCAGG 

I T A iO^^ C VYDLYLPKSWTIQV 273 
961 TGGGTCTAGTTTTCCTGTTGGACAATCCAGCCCCATCCCACTTGGTGGAGAAGATTGTCT 

GLVSLLDN PAPSHLVEKIVY 293 
1021 ACCACAGCAAGTACAAGCCAAAGAGGCTGGGCAATGACATCGCCCTTATGAAGCTGGCCG 

HSKYKPKRL gQ DIALMKLAG 313 
1081 GGCCACTCACGTTCAATGAAATGATCCAGCCrTGTGTGCCTGCCCAACTCTGAAGAGAACT 

PLTFNEMIQPVCLPNSEENF 333 
114 1 TCCCCGATGGAAAAGTGTGCTGGACGTCAGGATGGGGGGCCACAGAGGATGGAGGTGACG 

P DGKVCWTSGWGATEDGGDA 353 
12 01 CCTCCCCTGTCCTGAACCACGCGGCCGTCCCTTTGATTTCCAACAAGATCTGCAACCACA 

S PVLNHAAVPLISNKICNHR 373 

12 61 GGGACGTGTACGGTGGCATCATCTCCCCCTCCATGCTCTGCGCGGGCTACCTGACGGGTG 

DVYGGI I S P SMLCAGYLTGG 393 
1321 GCGTGGACAGCTGCCAGGGGGACAGCGGGGGGCCCCTGGTGTGTCAAGAGAGGAGGCTGT 

V D S C Q G (d) SGGPLVCQERRLW 413 

13 81 ggaagttagtgggagcgaccJtoctttggcatcggctgcgcagaggtgaacaagcctgggg 

KLVGATSFGIGCAEVNKPGV 433 

14 41 tgtacacccgtgtcacctccttcctggactggatccacgagcagatggagagagacctaa 

YTRVTSFLDWIHEQMERDLK 453 

15 01 aaacctgaagaggaaggggacaagtagccacctgagttcctgaggtgatgaagacagccc 

T * (SEQ id no. 2) 45 4 

15 61 GATCCTCCCCTGGACTCCCGTGTAGGAACCTGCACACGAGCAGACACCCTTGGAGCTCTG 

1621 AGTTCCGGCACCAGTAGCGGGCCCGAAAGAGGCACCCTTCCATCTGATTCCAGCACAACC 

1681 TTCAAGCTGCTTTTTGTTTTTTGTTTTTTTGAGGTGGAGTCTCGCTCTGTTGCCCAGGCT 

17 4 1 GGAGTGCAGTGGCGAAATACCCTGCTCACTGCAGCCTCCGCTTCCCTGGTTCAAGCGATT 

18 01 CTCTTGCCTCAGCTTCCCCAGTAGCTGGGACCACAGGTGCCCGCCACCACACCCAACTAA 
18 61 TTTTTGTATTTTTAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGCTCTCAAACCCC 
1921 TGACCTCAAATGATGTGCCTGCTTCAGCCTCCCACAGTGCTGGGATTACAGGCATGGGCC 
1981 ACCACGCCTAGCCTCACGCTCCTTTCTGATCTTCACTAAGTVACAAAAGAAGCAGCAACTT 
2 04 1 GCAAGGGCGGCCTTTCCCACTGGTCCATCTGGTTTTCTCTCCAGGGTCTTGCAAAATTCC 
2101 TGACGAGATAAGCAGTTATGTGACCTCACGTGCAAAGCCACCAACAGCCACTCAGAAAAG 
2161 ACGCACCAGCCCAGAAGTGCAG7VACTGCAGTCACTGCACGTTTTCATCTTTAGGGACCAG 
22 21 AACCAAACCCACCCTTTCTACTTCCAAGACTTATTTTCACATGTGGGGAGGTTAATCTAG 
22 81 GAATGACTCGTTTAAGGCCTATTTTCATGATTTCTTTGTAGCATTTGGTGCTTGACGTAT 
2 3 4 1 TATTGTCCTTTGAT TCCAAAT AATATGTTTCCTTCCCTCAAAAAAAAAAAAAAAAAAAAA 
2 4 01 AAAAAAAAAAAAA (SEQ ID NO. 1) 

FIG. 4 



4/9 



wo 00/52044 



PCT/USOO/05612 



CompcS 
Matr 
Gp300-1 
Gp300-2 
TAD612 
Tmprss2 
Cons 



CEG. .FVC 
CPG . QFTC 
CQQGYFKC 
CSSHQITC 
CSGK . YRC 
CSNS6IEC 
C C 



AQTGRCVNRR 
. RTGRCIRKE 

QSEGQCIPSS 
. SKGQCIPSE 

RSSFKCXEXil 

DSSGTCIHPS 
C 



LLCM6DMDCG 
IJICDGWADCT 
WVCDQDQDCD 
YRCDHVRDCP 
TRCDGVSDCK 
NWCDGVSHCP 
C C 



DQSDEAM . C 


(SEQ 


ZD 


NO. 


9 ) 


DHSDELK . C 


(SEQ 


XD 


NO. 


10 ) 


DGSDERQDC 


(SEQ 


ID 


NO. 


11) 


DGADE.NDC 


(SEQ 


ZD 


NO. 


12 ) 


DGEDEYR . C 


(SEQ 


XD 


NO. 


13 ) 


GGEDENR . C 


(SEQ 


ZD 


NO. 


14 ) 


DE C 











FIG. 5A 



BovEntk VRI.VGGSGPH 

MacSR VRIiVGGSGPH 

TADG12 VRVGG . . . QN 
Tmprss2 ' VRLYG. . . PN 

HumEntk VRFFNGTTNN 

Cons VR 

BovEntk VHKRAYFGKG 

MacSR VHKAAHFGQG 

TADG12 SSDNIiRVSSL 

Tmprss2 SSQGIVDDSG 

HumEntk NSSKPXFSTD 
Cons 



EGRVEX.FHE GQWGTVCDDR 
EGRVEI.LHS GQWGTICDDR 
AVLQVFTA. . ASWKTMCSDD 
FIIiQMYSSQR KSWHPVCQDD 
NGLVRFRXQ . S IWHTACAEN 

W C 

TGPIWIiNEVF CFGK. .ESSX 
TGPIWIiNEVF CFGR, .ESSX 
EGQFREEFVS X.DHLLPDDK 
STSFMKIiNTS A.GNV. . .DX 
GGPFVKIiNTA PDGHLILTPS 



WELRGGLWC RSLGYKGVQS 
WEVRVGQWC RSI.GYPGVQA 
WKGHYANVAC AQLGFP . SYV 
WNENYG3EUVAC RDMGYKNNFY 
WTTQISNDVC QLLGLGSG.. 
W C 

EECRIRQWGV R.ACSHDEDA 
EECKIRQWGT R.ACSHSEDA 
VTAIiHHSVYV REGCASGHW 
YKKLYHS... .DACSSKAW 

QQ CLQDSLX 

C 



BovEntk GVTCT 

MacSR GVTCT 

TADG12 TLQCT 

Tmprs82 SLRCL 

HumEntk RLQC . 

Cons C 



(SEQ XD NO. 15) 

( SEQ ZD NO . 16) 

(SEQ ID NO. 17) 

(SEQ ID NO. 18) 

(SEQ ID NO. 19) 



FIG. 5B 
5/9 



wo 00/52044 



PCT/USOO/05612 



ProM 
Tryl 
Kal 
TAD612 
Tn^rss2 
Heps 
Cons 



LWVLTAAHCK 
QWWSAGHCY 
QWVLTAAHCF 
LWIITAAHCV 
EWIVTAAHCV 
DWVLTAAHCF 
W A HC 



KPm. 

KSRI 

D . 6LPLQDVH 
. YDLYLPKSW 
EKPI^PWHW 
PERNRVLSRW 



QVFL6KHKLR 
QVRL6EHNIE 
RIYS6ILNLS 
TIQVGLV . . S 
TAFA6ILRQS 
RVFA6AVAQA 
6 



QRESSQEQSS 
VLEGNEQFIK 
DITKDTPFSQ 
LLDNPAPSHL 
EMFYGA . GYQ 
SPHGLQLG. . 



WRAVIHPDY 
AAKIXRHPQY 
IKEIIIHQNY 
VEKIVYHSKY 
VQKVISHPNY 
VQAWYHGGY 
H Y 



ProM DAAS HDQDIMLLRI* 

Tryl DRKT I.HNDIMLIKI< 

Kal KVSE 6NHDIALIKL 

TADG12 KPKR LGNDlAI^MKli 

Tmprss2 DSKT KKHDIAIiMKI^ 

Heps I.PFRDPNSEE NSNDIALVHIi 

Cons DX L I» 



ARPAKIiSEIil 
SSRAVINARV 
QAPLNYTEFQ 
AGPLTFNEMI 
QKPLTFNDLV 
SSPLPLTEYI 



QPLPLERDCS 
STISIiPTAPP 
KPICIiPSKGD 
QPVCIiPNSEE 
KPVCLPNPGM 
QPVCLPAAGQ 



ANT . . TSCHI 
ATG . , TKCLI 
TSTIYTNCWV 
NFPDGKVCWT 
MLQPEQLCWI 
ALVDGKICTV 
C 



ProM 
Tryl 
Kal 
TADG12 
Tmprss2 
Heps 
Cons 



LGWGKTAD . , 
S6WGNTASSG 
TGWGFSKEK . 
SGWGAT . EDG 
S6WGAT . EEK 
TGWGNT . QYY 
GWG 



GDFPDTIQCA 
ADYPDELQCL 
GEIQNILQKV 
GDASPVItHHA 
GKTSEVLNAA 
GQQAGVIiQEA 



YIHLVSREEC 
DAPVLSQAKC 
NIPLVTNEEC 
AVPLISNKIC 
KVLIiIETQRC 
RVPIISNDVC 
C 



EHA. .YPGQI 
EAS . . YPGKI 
QKR . YQDYKI 
NHRDVYGGII 
NSRYVYDNIil 
NGADFYGKQI 
I 



TQNMLCAGDE 
TSNMFCVGFL 
TQRMVCAGYK 
SPSMLCAGYIi 
TPAMICAGFIi 
KPKMFCAGYP 
M C G 



ProM 
Tryl 
Kal 
TADG12 
Tniprss2 
Heps 
Cons 



KYGKDSCQGD 
E6GKDSCQGD 
EGGKDACKGD 
TGGVDSCQGD 
QGNVDSCQGD 
EGGIDACQGD 
D C GD 



SGGPLVC 
SGGPWC 
SGGPLVC 
SGGPLVC 
SGGPLVT 
SG6PFVC 
SGGP V 



(SEQ 


ID 


NO. 


20 ) 


(SEQ 


ID 


NO* 


21) 


(SEQ 


ID 


NO. 


22 ) 


(SEQ 


ID 


NO. 


23 ) 


(SEQ 


ID 


NO. 


24) 


(SEQ 


ID 


NO. 


25 ) 



FIG. 5C 



6/9 



wo 00/52044 



PCT/USOO/05612 






TADG-12 
B-Tubulin 



B-Tubulin 
TADG-12V 




FIG. 6 



7/9 



wo 00/52044 



PCTAJSOO/05612 




FIG. 7A 




FIG. 7B 




FIG. 7C 
8/9 



wo 00/52044 



PCT/USOO/05612 




9/9 



wo 00/52044 



PCT/USOO/05612 



SEQUENCE LISTING 



<110> O'Brien, Timothy J, 

Underwood, Lowell J. 
<120> Transmembrane Serine Protease Overexpressed 

in Ovarian Carcinoma and Uses Thereof 
<130> D6192PCT 
<141> 2000-03-02 
<150> 09/261,416 
<151> 1999-03-03 
<160> 153 



<210> 1 

<211> 2413 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<223> entire cDNA sequence of TADG-12 gene 

<400> 1 



cgggaaaggg ctgtgtttat gggaagccag taacactgtg gcctactatc 50 
tcttccgtgg tgccatctac atttttggga ctcgggaatt atgaggtaga 100 
ggtggaggcg gagccggatg tcagaggtcc tgaaatagtc accatggggg 150 
aaaatgatcc gcctgctgtt gaagccccct tctcattccg atcgcttttt 200 
ggccttgatg atttgaaaat aagtcctgtt gcaccagatg cagatgctgt 250 
tgctgcacag atcctgtcac tgctgccatt tgaagttttt tcccaatcat 300 
cgtcattggg gatcattgca ttgatattag cactggccat tggtctgggc 350 
atccacttcg actgctcagg gaagtacaga tgtcgctcat cctttaagtg 400 
tatcgagctg ataactcgat gtgacggagt ctcggattgc aaagacgggg 450 
aggacgagta ccgctgtgtc cgggtgggtg gtcagaatgc cgtgctccag 500 
gtgttcacag ctgcttcgtg gaagaccatg tgctccgatg actggaaggg 550 
tcactacgca aatgttgcct gtgcccaact gggtttccca agctatgtga 600 
gttcagataa cctcagagtg agctcgctgg aggggcagtt ccgggaggag 650 
tttgtgtcca tcgatcacct cttgccagat gacaaggtga ctgcattaca 700 
ccactcagta tatgtgaggg agggatgtgc ctctggccac gtggttacct 750 
tgcagtgcac agcctgtggt catagaaggg gctacagctc acgcatcgtg 800 
ggtggaaaca tgtccttgct ctcgcagtgg ccctggcagg ccagccttca 850 
gttccagggc taccacctgt gcgggggctc tgtcatcacg cccctgtgga 900 
tcatcactgc tgcacactgt gtttatgact tgtacctccc caagtcatgg 950 
accatccagg tgggtctagt ttccctgttg gacaatccag ccccatccca 1000 
cttggtggag aagattgtct accacagcaa gtacaagcca aagaggctgg 1050 
gcaatgacat cgcccttatg aagctggccg ggccactcac gttcaatgaa 1100 
atgatccagc ctgtgtgcct gcccaactct gaagagaact tccccgatgg 1150 
aaaagtgtgc tggacgtcag gatggggggc cacagaggat ggaggtgacg 1200 
cctcccctgt cctgaaccac gcggccgtcc ctttgatttc caacaagatc 1250 
tgcaaccaca gggacgtgta cggtggcatc atctccccct ccatgctctg 13 00 
cgcgggctac ctgacgggtg gcgtgaacag ctgccagggg gacagcgggg 1350 
ggcccctggt gtgtcaagag aggaggctgt ggaagttagt gggagcgacc 1400 
agctttggca tcggctgcgc agaggtgaac aagcctgggg tgtacacccg 1450 
tgtcacctcc ttcctggact ggatccacga gcagatggag agagacctaa 1500 
aaacctgaag aggaagggga caagtagcca cctgagttcc tgaggtgatg 1550 
aagacagccc gatcctcccc tggactcccg tgtaggaacc tgcacacgag 1600 
cagacaccct tggagctctg agttccggca ccagtagcgg gcccgaaaga 1650 
ggcacccttc catctgattc cagcacaacc ttcaagctgc tttttgtttt 1700 
ttgttttttt gaggtggagt ctcgctctgt tgcccaggct ggagtgcagt 1750 



SEQ 1/41 



wo 00/52044 



PCT/USOO/05612 



ggcgaaatac cctgctcact gcagcctccg cttccctggt tcaagcgatt 1800 
ctcttgcctc agcttcccca gtagctggga ccacaggtgc ccgccaccac 1850 
acccaactaa tttttgtatt tttagtagag acagggtttc accatgttgg 1900 
ccaggctgct ctcaaacccc tgacctcaaa tgatgtgcct gcttcagcct 1950 
cccacagtgc tgggattaca ggcatgggcc accacgccta gcctcacgct 2000 
cctttctgat cttcactaag aacaaaagaa gcagcaactt gcaagggcgg 2050 
cctttcccac tggtccatct ggttttctct ccagggtctt gcaaaattcc 2100 
tgacgagata agcagttatg tgacctcacg tgcaaagcca ccaacagcca 2150 
ctcagaaaag acgcaccagc ccagaagtgc agaactgcag tcactgcacg 2200 
ttttcatctt tagggaccag aaccaaaccc accctttcta cttccaagac 2250 
ttattttcac atgtggggag gttaatctag gaatgactcg tttaaggcct 23 00 
attttcatga tttctttgta gcatttggtg cttgacgtat tattgtcctt 2350 
tgattccaaa taatatgttt ccttccctca aaaaaaaaaa aaaaaaaaaa 2400 
aaaaaaaaaa aaa 2413 

<210> 2 

<211> 454 

<212> PRT 

<213> Homo sapiens 

<220> 

<223> complete amino acid sequence of TADG-12 

protein 

<400> 2 



Met 


Gly 


Glu 


Asn 


Asp 
5 


Pro 


Pro 


Ala 


Val 


Glu 
10 


Ala 


Pro 


Phe 


Ser 


Phe 
15 


Arg 


Ser 


Leu 


Phe 


Gly 
20 


Leu 


Asp 


Asp 


Leu 


Lys 
25 


He 


Ser 


Pro 


Val 


Ala 
30 


Pro 


Asp 


Ala 


Asp 


Ala 
35 


Val 


Ala 


Ala 


Gin 


He 
40 


Leu 


Ser 


Leu 


Leu 


Pro 
45 


Phe 


Glu 


Val 


Phe 


Ser 
50 


Gin 


Ser 


Ser 


Ser 


Leu 
55 


Gly 


He 


He 


Ala 


Leu 
60 


lie 


Leu 


Ala 


Leu 


Ala 
65 


lie 


Gly 


Leu 


Gly 


He 
70 


His 


Phe 


Asp 


Cys 


Ser 
75 


Gly 


Lys 


Tyr 


Arg 


Cys 
80 


Arg 


Ser 


Ser 


Phe 


Lys 
85 


Cys 


He 


Glu 


Leu 


He 
90 


Thr 


Arg 


Cys 


Asp 


Gly 
95 


Val 


Ser 


Asp 


Cys 


Lys 
100 


Asp 


Gly 


Glu 


Asp 


Glu 
105 


Tyr 


Arg 


Cys 


Val 


Arg 
110 


Val 


Gly 


Gly 


Gin 


Asn 
115 


Ala 


Val 


Leu 


Gin 


Val 
120 


Phe 


Thr 


Ala 


Ala 


Ser 
125 


Trp 


Lys 


Thr 


Met 


Cys 
130 


Ser 


Asp 


Asp 


Trp 


Lys 
135 


Gly 


His 


Tyr 


Ala 


Asn 
140 


Val 


Ala 


Cys 


Ala 


Gin 
145 


Leu 


Gly 


Phe 


Pro 


Ser 
150 


Tyr 


Val 


Ser 


Ser 


Asp 
155 


Asn 


Leu 


Arg 


Val 


Ser 
160 


Ser 


Leu 


Glu 


Gly 


Gin 
165 


Phe 


Arg 


Glu 


Glu 


Phe 
170 


Val 


Ser 


He 


Asp 


His 
175 


Leu 


Leu 


Pro 


Asp 


Asp 
180 


Lys 


Val 


Thr 


Ala 


Leu 
185 


His 


His 


Ser 


Val 


Tyr 
190 


Val 


Arg 


Glu 


Gly 


Cys 
195 


Ala 


Ser 


Gly 


His 


Val 
200 


Val 


Thr 


Leu 


Gin 


Cys 
205 


Thr 


Ala 


Cys 


Gly 


His 
210 


Arg 


Arg 


Gly 


Tyr 


Ser 
215 


Ser 


Arg 


He 


Val 


Gly 
220 


Gly 


Asn 


Met 


Ser 


Leu 
225 


Leu 


Ser 


Gin 


Trp 


Pro 
230 


Trp 


Gin 


Ala 


Ser 


Leu 
235 


Gin 


Phe 


Gin 


Gly 


Tyr 
240 



SEQ 2/41 



wo 00/52044 



PCT/USOO/05612 



HIS 




ijys 


Giy 


Vaiy 

245 


oer 


vai 


lie 


Tnr 


Fro 
250 


Leu 


irp 


lie 


lie 


255 


TV T 

Ala 


Aia 


rlxS 


\jys 


vai 
260 


Tyr 


ASp 


Leu 


lyr 


lieu 
265 


Fro 


ijys 


ber 


irp 


270 


xxe 




vax 


Lxi.y 


i-jeu 
275 


va± 


oer 


ijeu 


lieu 


ASp 

280 


Asn 


Fro 


Aia 


Fro 


ber 
285 


T T -I 

His 




va± 


IjXU 


j-iys 
290 


xxe 


vai 


lyr 


llXS 


oer 
295 


i-iys 


lyr 


liys 


rf"^ 

Fro 


LiyS 

300 


Arg 




Gly 




ASp 

305 


lie 


Ala 


Leu 


Heu 


Lys 
310 


Leu 


Ala 


Gly 


Fro 


Lieu 
315 






Asn 


LjXU 


ne c 
320 


lie 


Gin 


Fro 


vai 


325 


Leu 


Fro 


Asn 


ber 


GIU 
330 


Glu 


Asn 




"W* 

fro 


ASp 

335 


Giy 


Lys 


vai 


Cys 


m ^^^^ 

i rp 
340 


inr 


ber 


Gly 


Trp 


Giy 

345 


Ala 




Glu 


ASp 


(jyiy 

350 


Gly 


Asp 


Ala 


oer 


Fro 
355 


val 


Leu 


Asn 


TT — 

His 


A 1 ^ 

Ala 
360 


Ala 


vai 


T~5 -w- 


Lieu 


± ±e 
365 


oer 


Asn 


Liys 


i±e 


vjys 
370 


Asn 


rll S 


Arg 


ASp 


T7=a 1 

va± 
375 


Tyr 


Cjiy 


Gly 


lie 


lie 
380 


oer 


Fro 


oer 


Mec 


Leu 
385 


uys 


A T — i 

Ala 


Giy 


lyr 


390 


Tfir 


Gly 


Gly 


vai 


ASp 

395 


ber 


Gys 


Gin 


Gly 


ASp 

400 


ber 


Gly 


Gly 


Fro 


LiSU 

405 


Val 


Cys 


Gin 


Glu 


Arg 

^ i u 


Arg 


Leu 


Trp 


Lys 


Leu 

/LI R 


Val 


Gly 


Ala 


Tnr 


Ser 


Phe 


Gly 


lie 


Gly 


Cys 
425 


Ala 


Glu 


Val 


Asn 


Lys 
430 


Pro 


Gly 


Val 


Tyr 


Thr 
435 


Arg 


Val 


Thr 


Ser 


Phe 
440 


Leu 


Asp 


Trp 


lie 


His 
445 


Glu 


Gin 


Met 


Glu 


Arg 
450 


Asp 


Leu 


Lys 


Thr 

























<210> 3 

<211> 2544 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> CDS 

<223> entire cDNA sequence of TADG-12 variant gene 

<400> 3 



cgggaaaggg ctgtgtttat gggaagccag taacactgtg gcctactatc 50 

tcttccgtgg tgccatctac atttttggga ctcgggaatt atgaggtaga 100 

ggtggaggcg gagccggatg tcagaggtcc tgaaatagtc accatggggg 150 

aaaatgatcc gcctgctgtt gaagccccct tctcattccg atcgcttttt 200 

ggccttgatg atttgaaaat aagtcctgtt gcaccagatg cagatgctgt 250 

tgctgcacag atcctgtcac tgctgccatt tgaagttttt tcccaatcat 3 00 

cgtcattggg gatcattgca ttgatattag cactggccat tggtctgggc 3 50 

atccacttcg actgctcagg gaagtacaga tgtcgctcat cctttaagtg 400 

tatcgagctg ataactcgat gtgacggagt ctcggattgc aaagacgggg 450 

aggacgagta ccgctgtgtc cgggtgggtg gtcagaatgc cgtgctccag 500 

gtgttcacag ctgcttcgtg gaagaccatg tgctccgatg actggaaggg 550 

tcactacgca aatgttgcct gtgcccaact gggtttccca agctatgtaa 600 

gttcagataa cctcagagtg agctcgctgg aggggcagtt ccgggaggag 65 0 

tttgtgtcca tcgatcacct cttgccagat gacaaggtga ctgcattaca 700 

ccactcagta tatgtgaggg agggatgtgc ctctggccac gtggttacct 750 

tgcagtgcac agcctgtggt catagaaggg gctacagctc acgcatcgtg 800 



SEQ 3/41 



wo 00/52044 



PCT/USOO/05612 



ggtggaaaca tgtccttgct ctcgcagtgg ccctggcagg ccagccttca 850 

gttccagggc taccacctgt gcgggggctc tgtcatcacg cccctgtgga 900 

tcatcactgc tgcacactgt gtttatgaga ttgtagctcc tagagaaagg 950 

gcagacagaa gaggaaggaa gctcctgtgc tggaggaaac ccacaaaaat 1000 

gaaaggacct agaccttccc atagctaatt ccagtggacc atgttatggc 1050 

agatacaggc ttgtacctcc ccaagtcatg gaccatccag gtgggtctag 1100 

tttccctgtt ggacaatcca gccccatccc acttggtgga gaagattgtc 1150 

taccacagca agtacaagcc aaagaggctg ggcaatgaca tcgcccttat 12 00 

gaagctggcc gggccactca cgttcaatga aatgatccag cctgtgtgcc 1250 

tgcccaactc tgaagagaac ttccccgatg gaaaagtgtg ctggacgtca 13 00 

ggatgggggg ccacagagga tggaggtgac gcctcccctg tcctgaacca 1350 

cgcggccgtc cctttgattt ccaacaagat ctgcaaccac agggacgtgt 1400 

acggtggcat catctccccc tccatgctct gcgcgggcta cctgacgggt 1450 

ggcgtggaca gctgccaggg ggacagcggg gggcccctgg tgtgtcaaga 1500 

gaggaggctg tggaagttag tgggagcgac cagctttggc atcggctgcg 155 0 

cagaggtgaa caagcctggg gtgtacaccc gtgtcacctc cttcctggac 1600 

tggatccacg agcagatgga gagagaccta aaaacctgaa gaggaagggg 1650 

acaagtagcc acctgagttc ctgaggtgat gaagacagcc cgatcctccc 17 00 

ctggactccc gtgtaggaac ctgcacacga gcagacaccc ttggagctct 1750 

gagttccggc accagtagcg ggcccgaaag aggcaccctt ccatctgatt 1800 

ccagcacaac cttcaagctg ctttttgttt tttgtttttt tgaggtggag 1850 

tctcgctctg ttgcccaggc tggagtgcag tggcgaaata ccctgctcac 1900 

tgcagcctcc gcttccctgg ttcaagcgat tctcttgcct cagcttcccc 1950 

agtagctggg accacaggtg cccgccacca cacccaacta atttttgtat 2000 

ttttagtaga gacagggttt caccatgttg gccaggctgc tctcaaaccc 2050 

ctgacctcaa atgatgtgcc tgcttcagcc tcccacagtg ctgggattac 2100 

aggcatgggc caccacgcct agcctcacgc tcctttctga tcttcactaa 2150 

gaacaaaaga agcagcaact tgcaagggcg gcctttccca ctggtccatc 2200 

tggttttctc tccagggtct tgcaaaattc ctgacgagat aagcagttat 22 50 

gtgacctcac gtgcaaagcc accaacagcc actcagaaaa gacgcaccag 23 00 

cccagaagtg cagaactgca gtcactgcac gttttcatct ttagggacca 2350 

gaaccaaacc caccctttct acttccaaga cttattttca catgtgggga 2400 

ggttaatcta ggaatgactc gtttaaggcc tattttcatg atttctttgt 2450 

agcatttggt gcttgacgta ttattgtcct ttgattccaa ataatatgtt 2500 

tccttccctc aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaa 2 544 



<210> 4 

<211> 294 

<212> PRT 

<213> Homo sapiens 

<220> 

<223> complete amino acid sequence of TADG-12 

variant protein 

<400> 4 



Met 


Gly 


Glu 


Asn 


Asp 


Pro 


Pro 


Ala 


Val 


Glu 


Ala 


Pro 


Phe 


Ser 


Phe 








5 










10 










15 


Arg 


Ser 


Leu 


Phe 


Gly 


Leu 


Asp 


Asp 


Leu 


Lys 


He 


Ser 


Pro 


Val 


Ala 








20 










25 










30 


Pro 


Asp 


Ala 


Asp 


Ala 


Val 


Ala 


Ala 


Gin 


He 


Leu 


Ser 


Leu 


Leu 


Pro 






35 










40 










45 


Phe 


Glu 


Val 


Phe 


Ser 


Gin 


Ser 


Ser 


Ser 


Leu 


Gly 


He 


He 


Ala 


Leu 










50 










55 










60 


lie 


Leu 


Ala 


Leu 


Ala 


He 


Gly 


Leu 


Gly 


He 


His 


Phe 


Asp 


Cys 


Ser 










65 










70 










75 


Gly 


Lys 


Tyr 


Arg 


Cys 


Arg 


Ser 


Ser 


Phe 


Lys 


Cys 


He 


Glu 


Leu 


He 



SEQ 4/41 



wo 00/52044 



PCTAJSOO/05612 



80 85 90 

Thr Arg Cys Asp Gly Val Ser Asp Cys Lys Asp Gly Glu Asp Glu 

95 100 105 

Tyr Arg Cys Val Arg Val Gly Gly Gin Asn Ala Val Leu Gin Val 

110 115 120 

Phe Thr Ala Ala Ser Trp Lys Thr Met Cys Ser Asp Asp Trp Lys 

125 130 135 

Gly His Tyr Ala Asn Val Ala Cys Ala Gin Leu Gly Phe Pro Ser 

140 145 150 

Tyr Val Ser Ser Asp Asn Leu Arg Val Ser Ser Leu Glu Gly Gin 

155 160 165 

Phe Arg Glu Glu Phe Val Ser lie Asp His Leu Leu Pro Asp Asp 

170 175 180 

Lys Val Thr Ala Leu His His Ser Val Tyr Val Arg Glu Gly Cys 

185 190 195 

Ala Ser Gly His Val Val Thr Leu Gin Cys Thr Ala Cys Gly His 

200 205 210 

Arg Arg Gly Tyr Ser Ser Arg lie Val Gly Gly Asn Met Ser Leu 

215 220 225 

Leu Ser Gin Trp Pro Trp Gin Ala Ser Leu Gin Phe Gin Gly Tyr 

230 235 240 

His Leu Cys Gly Gly Ser Val lie Thr Pro Leu Trp lie lie Thr 

245 250 255 

Ala Ala His Cys Val Tyr Glu lie Val Ala Pro Arg Glu Arg Ala 

260 265 270 

Asp Arg Arg Gly Arg Lys Leu Leu Cys Trp Arg Lys Pro Thr Lys 

275 280 285 

Met Lys Gly Pro Arg Pro Ser His Ser 

290 

<210> 5 
<211> 174 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> nucleotide sequence of the subclone containing 

the 180 bp band from the PCR product for TADG-12 
<400> 5 

tgggtggtga cggcggcgca ctgtgtttat gacttgtacc tccccaagtc 50 
atggaccatc caggtgggtc tagtttccct gttggacaat ccagccccat 100 
cccacttggt ggagaagatt gtctaccaca gcaagtacaa gccaaagagg 150 
ctgggcaacg acatcgccct ccta 174 

<210> 6 
<211> 58 
<212> PRT 

<213> Artificial sequence 

<220> 

<223> deduced amino acid sequence of the 180 bp band 

from the PCR product for TADG-12 
<400> 6 

Trp Val Val Thr Ala Ala His Cys Val Tyr Asp Leu Tyr Leu Pro 

5 10 15 

Lys Ser Trp Thr lie Gin Val Gly Leu Val Ser Leu Leu Asp Asn 



SEQ 5/41 



wo 00/52044 



PCT/USOO/05612 



20 

Pro Ala Pro Ser His 

35 

Tyr Lys Pro Lys Arg 

50 



25 

Leu Val Glu Lys lie 

40 

Leu Gly Asn Asp lie 

55 



30 

Val Tyr His Ser Lys 

45 

Ala Leu Leu 



<210> 7 
<211> 328 
<212> DNA 

<213> Artificial sequence 

<220> 

<223> nucleotide sequence of the subclone containing 

the 3 00 bp band from the PGR product for 
TADG-12 variant, which contains an additional 
insert of 133 bases 

<400> 7 



gggtggtgac ggcggcgcac tgtgtttatg agattgtagc tcctagagaa 50 

agggcagaca gaagaggaag gaagctcctg tgctggagga aacccacaaa 100 

aatgaaagga cctagacctt cccatagcta attccagtgg accatgttat 150 

ggcagataca ggcttgtacc tccccaagtc atggaccatc caggtgggtc 200 

tagtttccct gttggacaat ccagccccat cccacttggt ggagaagatt 250 

gtctaccaca gcaagtacaa gccaaagagg ctgggcaacg acatcgccct 3 00 

cctaatcact agtgcggccg cctgcagg 328 



<210> 8 
<211> 42 
<212> PRT 

<213> Artificial sequence 

<220> 

<223> deduced amino acid sequence of the 3 00 bp band 

from the PGR product for TADG-12 variant, which is 
a truncated form of TADG-12 

<400> 8 



Val Val Thr 
Glu Arg Ala 
Pro Thr Lys 



Ala Ala His 
5 

Asp Arg Arg 
20 

Met Lys Gly 
35 



Cys Val Tyr 
Gly Arg Lys 
Pro Arg Pro 



Glu lie Val 
10 

Leu Leu Cys 
25 

Ser His Ser 
40 



Ala Pro Arg 

15 

Trp Arg Lys 
30 



<210> 9 

<211> 34 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the complement subunit C8 

(Compc8) 

<400> 9 



Cys Glu Gly Phe Val Cys Ala Gin Thr Gly Arg Cys Val Asn Arg 

5 10 15 

Arg Leu Leu Cys Asn Gly Asp Asn Asp Cys Gly Asp Gin Ser Asp 

20 25 30 



SEQ 6/41 



wo 00/52044 



PCT/USOO/05612 



Glu Ala Asn Cys 



<210> 10 

<211> 34 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the serine protease 

matriptase (Matr) 
<400> 10 

Cys Pro Gly Gin Phe Thr Cys Arg Thr Gly Arg Cys lie Arg Lys 

5 10 15 

Glu Leu Arg Cys Asp Gly Trp Ala Asp Cys Thr Asp His Ser Asp 

20 25 30 

Glu Leu Asn Cys 



<210> 11 

<211> 37 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the glycoprotein GP3 00 

(Gp300-1) 
<400> 11 

Cys Gin Gin Gly Tyr Phe Lys Cys Gin Ser Glu Gly Gin Cys lie 

5 10 15 

Pro Ser Ser Trp Val Cys Asp Gin Asp Gin Asp Cys Asp Asp Gly 

20 25 30 

Ser Asp Glu Arg Gin Asp Cys 

35 

<210> 12 

<211> 35 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the glycoprotein GP300 

(Gp300-2) 
<400> 12 

Cys Ser Ser His Gin lie Thr Cys Ser Asn Gly Gin Cys lie Pro 

5 10 15 

Ser Glu Tyr Arg Cys Asp His Val Arg Asp Cys Pro Asp Gly Ala 

20 25 30 

Asp Glu Asn Asp Cys 

35 

<210> 13 
<211> 35 



SEQ 7/41 



wo 00/52044 



PCT/USOO/05612 



<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<222> 74 . . . 108 

<223> LDLR-A domain of TADG-12 

<400> 13 



Cys Ser Gly Lys Tyr Arg Cys Arg Ser Ser Phe Lys Cys lie Glu 

5 10 15 

Leu lie Thr Arg Cys Asp Gly Val Ser Asp Cys Lys Asp Gly Glu 

20 25 30 

Asp Glu Tyr Arg Cys 

35 



<210> 14 

<211> 36 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> LDLR-A domain of the serine protease TMPRSS2 

Tmprss2 

<400> 14 



Cys Ser Asn 
Asn Pro Ser 
Glu Asp Glu 



Ser Gly lie 
5 

Asn Trp Cys 
20 

Asn Arg Cys 
35 



Glu Cys Asp 
Asp Gly Val 



Ser Ser Gly 
10 

Ser His Cys 
25 



Thr Cys lie 

15 

Pro Gly Gly 
30 



<210> 15 

<211> 101 

<212> PRT 

< 2 1 3 > Bos taurus 

<220> 

<221> DOMAIN 

<223> SRCR domain of bovine enterokinase (BovEntk) 

<400> 15 



Val 


Arg 


Leu 


Val 


Gly 
5 


Gly 


Ser 


Gly 


lie 


Phe 


His 


Glu 


Gly 


Gin 


Trp 


Gly 










20 








Glu 


Leu 


Arg 


Gly 


Gly 


Leu 


Val 


Val 










35 








Gly 


Val 


Gin 


Ser 


Val 


His 


Lys 


Arg 










50 








Gly 


Pro 


lie 


Trp 


Leu 


Asn 


Glu 


Val 










65 








Ser 


lie 


Glu 


Glu 


Cys 


Arg 


lie 


Arg 










80 








Ser 


His 


Asp 


Glu 


Asp 


Ala 


Gly Val 










95 









Pro 


His 
10 


Glu 


Gly 


Arg 


Val 


Glu 
15 


Thr 


Val 
25 


Cys 


Asp 


Asp 


Arg 


Trp 
30 


Cys 


Arg 
40 


Ser 


Leu 


Gly 


Tyr 


Lys 
45 


Ala 


Tyr 
55 


Phe 


Gly 


Lys 


Gly 


Thr 
60 


Phe 


Cys 
70 


Phe 


Gly 


Lys 


Glu 


Ser 
75 


Gin 


Trp 
85 


Gly 


Val 


Arg 


Ala 


Cys 
90 


Thr 


Cys 
100 


Thr 











SEQ 8/41 



wo 00/52044 



PCT/USOO/05612 



<210> 16 

<211> 101 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> SRCR domain of human macrophage scavenger 

receptor (MacSR) 

<400> 16 



Val 


Arg 


Leu 


Val 


Gly 
5 


Gly 


Ser 


Gly 


Pro 


His 
10 


Glu 


Gly Arg 


Val 


Glu 
15 


He 


Leu 


His 


Ser 


Gly 
20 


Gin 


Trp 


Gly 


Thr 


He 
25 


Cys 


Asp Asp 


Arg 


Trp 
30 


Glu 


Val 


Arg 


Val 


Gly 
35 


Gin 


Val 


Val 


Cys 


Arg 
40 


Ser 


Leu Gly 


Tyr 


Pro 
45 


Gly 


Val 


Gin 


Ala 


Val 
50 


His 


Lys 


Ala 


Ala 


His 
55 


Phe 


Gly Gin 


Gly 


Thr 
60 


Gly 


Pro 


He 


Trp 


Leu 
65 


Asn 


Glu 


Val 


Phe 


Cys 
70 


Phe 


Gly Arg 


Glu 


Ser 
75 


Ser 


He 


Glu 


Glu 


Cys 
80 


Lys 


He 


Arg 


Gin 


Trp 
85 


Gly 


Thr Arg 


Ala 


Cys 
90 


Ser 


His 


Ser 


Glu 


Asp 
95 


Ala 


Gly 


Val 


Thr 


Cys 
100 


Thr 









<210> 17 

<211> 98 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<222> 109 . . .206 

<223> SRCR domain of TADG-12 {TADG12) 

<400> 17 



Val 


Arg 


Val 


Gly 


Gly 


Gin 


Asn 


Ala 


Val 


Leu 


Gin 


Val 


Phe 


Thr 


Ala 










5 










10 










15 


Ala 


Ser 


Trp 


Lys 


Thr 


Met 


Cys 


Ser 


Asp 


Asp 


Trp 


Lys 


Gly 


His 


Tyr 










20 










25 










30 


Ala 


Asn 


Val 


Ala 


Cys 


Ala 


Gin 


Leu 


Gly 


Phe 


Pro 


Ser 


Tyr 


Val 


Ser 










35 










40 










45 


Ser 


Asp 


Asn 


Leu 


Arg 


Val 


Ser 


Ser 


Leu 


Glu 


Gly 


Gin 


Phe 


Arg 


Glu 










50 










55 










60 


Glu 


Phe 


Val 


Ser 


He 


Asp 


His 


Leu 


Leu 


Pro 


Asp 


Asp 


Lys 


Val 


Thr 










65 










70 










75 


Ala 


Leu 


His 


His 


Ser 


Val 


Tyr 


Val 


Arg 


Glu 


Gly 


Cys 


Ala 


Ser 


Gly 










80 










85 










90 


His 


Val 


Val 


Thr 


Leu 


Gin 


Cys 


Thr 

















95 



<210> 18 

<211> 94 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 



SEQ 9/41 



wo 00/52044 



PCT/USOO/05612 



<223> SRCR domain of the serine protease TMPRSS2 

{Tmprss2) 
<400> 18 



vax 


Arg 


Lieu 


Tyr 


\j±y 


JriO 




irlie 


X xe 


j-ieu 


vjixn 


jYie L. 


lyr 


O ^ 

oer 


oer 










c; 
•J 










1 0 












Gin 


Arg 


Lys 


Ser 


Trp 


His 


Pro 


Val 


Cys 


Gin 


Asp 


Asp 


Trp 


Asn 


Glu 










20 










25 










30 


Asn 


Tyr 


Gly 


Arg 


Ala 


Ala 


Cys 


Arg 


Asp 


Met 


Gly 


Tyr 


Lys 


Asn 


Asn 










35 










40 










45 


Phe 


Tyr 


Ser 


Ser 


Gin 


Gly 


lie 


Val 


Asp 


Asp 


Ser 


Gly 


Ser 


Thr 


Ser 










50 










55 










60 


Phe 


Met 


Lys 


Leu 


Asn 


Thr 


Ser 


Ala 


Gly 


Asn 


Val 


Asp 


lie 


Tyr 


Lys 










65 










70 










75 


Lys 


Leu 


Tyr 


His 


Ser 


Asp 


Ala 


Cys 


Ser 


Ser 


Lys 


Ala 


Val 


Val 


Ser 



80 85 90 

Leu Arg Cys Leu 



<210> 19 

<211> 90 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> SRCR domain of human enterokinase (HumEntk) 

<400> 19 



Val 


Arg 


Phe 


Phe 


Asn 


Gly 


Thr 


Thr 


Asn 


Asn 


Asn 


Gly 


Leu 


Val 


Arg 










5 










10 










15 


Phe 


Arg 


lie 


Gin 


Ser 


He 


Trp 


His 


Thr 


Ala 


Cys 


Ala 


Glu 


Asn 


Trp 










20 










25 










30 


Thr 


Thr 


Gin 


lie 


Ser 


Asn 


Asp 


Val 


Cys 


Gin 


Leu 


Leu 


Gly 


Leu 


Gly 










35 










40 










45 


Ser 


Gly 


Asn 


Ser 


Ser 


Lys 


Pro 


He 


Phe 


Ser 


Thr 


Asp 


Gly 


Gly 


Pro 










50 










55 










60 


Phe 


Val 


Lys 


Leu 


Asn 


Thr 


Ala 


Pro 


Asp 


Gly 


His 


Leu 


He 


Leu 


Thr 










65 










70 










75 


Pro 


Ser 


Gin 


Gin 


Cys 


Leu 


Gin 


Asp 


Ser 


Leu 


He 


Arg 


Leu 


Gin 


Cys 










80 










85 










90 




<210> 




20 
























<211> 




149 
























<212> 




PRT 
























<213> 




Homo 


sapiens 




















<220> 




























<221> 




DOMAIN 






















<223> 




protease 


domain 


of protease 


M (ProM) 








<400> 




20 






















Leu 


Trp 


Val 


Leu 


Thr 


Ala 


Ala 


His 


Cys 


Lys 


Lys 


Pro 


Asn 


Leu 


Gin 










5 










10 










15 


Val 


Phe 


Leu 


Gly 


Lys 


His 


Asn 


Leu 


Arg 


Gin 


Arg 


Glu 


Ser 


Ser 


Gin 










20 










25 










30 


Glu 


Gin 


Ser 


Ser 


Val 


Val 


Arg 


Ala 


Val 


He 


His 


Pro 


Asp 


Tyr 


Asp 










35 










40 










45 


Ala 


Ala 


Ser 


His 


Asp 


Gin 


Asp 


He 


Met 


Leu 


Leu 


Arg 


Leu 


Ala 


Arg 



SEQ 10/41 



wo 00/52044 



PCT/USOO/05612 











50 










55 










60 




AT ^3 


J— l_y t3 


X_i \_> 1^ 


65 


Glu 




Tie 

X X 


Gin 


70 


XJ V|B> 


X 


T 1 

J-J l_X 


Glu 


"X y 

75 




^ jr 0 




Ala 


A c^n 

80 




X XXX 


OCX 




His 
85 


lie 


T 1^11 

XJC u 


niv 

vjx 




Glv 

\JX 

90 


T A/c: 






A c;n 


95 




x^xxc? 


X X luf 


A QT^ 


X XXX 

100 


Tie* 

X X w 


Gl n 

wXXX 


0 


AT ;5 


lyx 
105 


xxe 


T T J n 

rllS 






^^^^ 

oer 

110 








<jys 


on n 

vjXU 

115 


rlxS 


J\±ci 


ryr 


-fc^ ^p*-v 


t»xy 
120 


Gin 


lie 


Thr 


Gin 


Asn 
125 


Met 


Leu 


Cys 


Ala 


Gly 
130 


Asp 


Glu 


Lys 


Tyr 


Gly 
135 


Lys 


Asp 


Ser 


Cys 


Gin 
140 


Gly 


Asp 


Ser 


Gly 


Gly 
145 


Pro 


Leu 


Val 


Cys 





<210> 21 

<211> 151 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of trypsinogen I (Tryl) 

<400> 21 



Gin 


Trp 


Val 


Val 


Ser 
5 


Ala 


Gly 


His 


Cys 


Tyr 
10 


Lys 


Ser 


Arg 


He 


Gin 
15 


Val 


Arg 


Leu 


Gly 


Glu 
20 


His 


Asn 


He 


Glu 


Val 
25 


Leu 


Glu 


Gly 


Asn 


Glu 
30 


Gin 


Phe 


He 


Asn 


Ala 
35 


Ala 


Lys 


He 


He 


Arg 
40 


His 


Pro 


Gin 


Tyr 


Asp 
45 


Arg 


Lys 


Thr 


Leu 


Asn 
50 


Asn 


Asp 


He 


Met 


Leu 
55 


He 


Lys 


Leu 


Ser 


Ser 
60 


Arg 


Ala 


Val 


He 


Asn 
65 


Ala 


Arg 


Val 


Ser 


Thr 
70 


He 


Ser 


Leu 


Pro 


Thr 
75 


Ala 


Pro 


Pro 


Ala 


Thr 
80 


Gly 


Thr 


Lys 


Cys 


Leu 
85 


He 


Ser 


Gly 


Trp 


Gly 
90 


Asn 


Thr 


Ala 


Ser 


Ser 
95 


Gly 


Ala 


Asp 


Tyr 


Pro 
100 


Asp 


Glu 


Leu 


Gin 


Cys 
105 


Leu 


Asp 


Ala 


Pro 


Val 
110 


Leu 


Ser 


Gin 


Ala 


Lys 
115 


Cys 


Glu 


Ala 


Ser 


Tyr 
120 


Pro 


Gly 


Lys 


He 


Thr 
125 


Ser 


Asn 


Met 


Phe 


Cys 
130 


Val 


Gly 


Phe 


Leu 


Glu 
135 


Gly 


Gly 


Lys 


Asp 


Ser 


Cys 


Gin 


Gly Asp 


Ser 


Gly 


Gly 


Pro 


Val 


Val 










140 










145 










150 



Cys 



<210> 22 

<211> 158 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of plasma kallikrein (Kal) 

<400> 22 

Gin Trp Val Leu Thr Ala Ala His Cys Phe Asp Gly Leu Pro Leu 



SEQ 11/41 



wo 00/52044 

5 



Gin 


Asp 


Val 


Trp 


Arg 


He 


Tyr 


Ser 


lie 


Thr 


Lys 


Asp 


Thr 


Pro 


Phe 


Ser 


His 


Gin 


Asn 


Tyr 


Lys 

3 U 


Val 


Ser 


Glu 


He 


Lys 


Leu 


Gin 


Ala 

D 3 


Pro 


Leu 


Asn 


He 


Cys 


Leu 


Pro 


Ser 


Lys 


Gly 


Asp 


Cys 


Trp 


Val 


Thr 


Gly 


Trp 


Gly 


Phe 




A O T~J 




JLlt3 Li. 


110 


T 

U jr o 


V CIX 




Glu 


Cys 


Gin 


Lys 


Arg 
125 


Tyr 


Gin 


Asp 


Val 


Cys 


Ala 


Gly 


Tyr 
140 


Lys 


Glu 


Gly 


Asp 


Ser 


Gly 


Gly 


Pro 


Leu 


Val 


Cys 



155 



<210> 23 

<211> 157 

<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain 

<400> 23 



Leu 


Trp 


He 


He 


Thr 
5 


Ala 


Ala 


His 


Pro 


Lys 


Ser 


Trp 


Thr 


He 


Gin 


Val 










20 








Asn 


Pro 


Ala 


Pro 


Ser 


His 


Leu 


Val 










35 








Lys 


Tyr 


Lys 


Pro 


Lys 


Arg 


Leu 


Gly 










50 








Leu 


Ala 


Gly 


Pro 


Leu 


Thr 


Phe 


Asn 










65 








Leu 


Pro 


Asn 


Ser 


Glu 


Glu 


Asn 


Phe 










80 








Thr 


Ser 


Gly 


Trp 


Gly 


Ala 


Thr 


Glu 










95 








Val 


Leu 


Asn 


His 


Ala 


Ala 


Val 


Pro 










110 








Asn 


His 


Arg 


Asp 


Val 


Tyr 


Gly 


Gly 










125 








Cys 


Ala 


Gly 


Tyr 


Leu 


Thr 


Gly 


Gly 










140 








Ser 


Gly 


Gly 


Pro 


Leu 


Val 


Cys 












155 









<210> 24 
<211> 159 



PCT/US00/0S612 



10 15 



VjrXy 


±±e 


jjeu 


7\ d" m 


ijeu 




















C7xn 


X jLe 


j-jys 


VjXU 


X xe 


Tl o 

X X t:^ 


Tl o 
X Xt^ 














*± o 


vji±y 


Asn 


flXS 


riSp 


Tl *a 




T .01 1 














An 
0 u 


r-i_r— 

Tyr 


Tnr 


o T 1 1 
CjXU 


fne 


Lxxn 


Ljys 


xrro 




/ U 












1 nr 


oer 


1 nr 


Tl o 


lyr 


X ILL 
















Q n 


oer 


juys 


oXU 


ijys 


Vjxy 




Tl <=» 
X xc^ 




100 










105 


He 


Pro 


Leu 


Val 


Thr 


Asn 


Glu 




115 










120 


Tyr 


Lys 


He 


Thr 


Gin 


Arg 


Met 




130 










135 


Gly 


Lys 


Asp 


Ala 


Cys 


Lys 


Gly 




145 










150 



of TADG-12 (TADG12) 



Cys 


Val 


Tyr 


Asp 


Leu 


Tyr 


Leu 




10 










15 


Gly 


Leu 


Val 


Ser 


Leu 


Leu 


Asp 




25 










30 


Glu 


Lys 


He 


Val 


Tyr 


His 


Ser 




40 










45 


Asn 


Asp 


He 


Ala 


Leu 


Met 


Lys 




55 










60 


Glu 


Met 


He 


Gin 


Pro 


Val 


Cys 




70 










75 


Pro 


Asp 


Gly 


Lys 


Val 


Cys 


Trp 




85 










90 


Asp 


Gly 


Gly 


Asp 


Ala 


Ser 


Pro 




100 










105 


Leu 


He 


Ser 


Asn 


Lys 


He 


Cys 




115 










120 


He 


He 


Ser 


Pro 


Ser 


Met 


Leu 




130 










135 


Val 


Asp 


Ser 


Cys 


Gin 


Gly 


Asp 




145 










150 



SEQ 12/41 



wo 00/52044 



PCT/USOO/056I2 



<212> PRT 

<213> Homo sapiens 

<220> 

<221> DOMAIN 

<223> protease domain of TMPRSS2 (Tmprss2) 

<400> 24 



Glu 


Trp 


He 


Val 


Thr 
5 


Ala 


Ala 


His 


Cys 


Val 


Glu 


Lys 


Pro 


Leu 


Asn 


Asn 


Pro 


Torp 


His 


Trp 
20 


Thr 


Ala 


Phe 


Ala 


Gly 

ZD 


He 


Leu 


Arg 


Gin 


Ser 

J U 


Phe 


Met 


Phe 


Tyr 


Gly Ala 


Gly 


Tyr 


Gin 


Val 


Gin 


Lys 


Val 


He 


Ser 










35 










4U 












His 


Pro 


Asn 


Tyr 


Asp 
50 


Ser 


Lys 


Thr 


Lys 


Asn 


Asn 


Asp 


He 


Ala 


Leu 

O U 


Met 


Lys 


Leu 


Gin 


Lys 
65 


Pro 


Leu 


Thr 


Phe 


Asn 
70 


Asp 


Leu 


Val 


Lys 


Pro 

75 


Val 


Cys 


Leu 


Pro 


Asn 


Pro 


Gly 


Met 


Met 


Leu 


Gin 


Pro 


Glu 


Gin 


Leu 








80 










85 










90 


Cys 


Trp 


He 


Ser 


Gly 
95 


Trp 


Gly 


Ala 


Thr 


Glu 
100 


Glu 


Lys 


Gly 


Lys 


Thr 
105 


Ser 


Glu 


Val 


Leu 


Asn 
110 


Ala 


Ala 


Lys 


Val 


Leu 
115 


Leu 


He 


Glu 


Thr 


Gin 
120 


Arg 


Cys 


Asn 


Ser 


Arg 
125 


Tyr 


Val 


Tyr 


Asp 


Asn 
130 


Leu 


He 


Thr 


Pro 


Ala 
135 


Met 


He 


Cys 


Ala 


Gly 
140 


Phe 


Leu 


Gin 


Gly 


Asn 
145 


Val 


Asp 


Ser 


Cys 


Gin 
150 


Gly 


Asp 


Ser 


Gly 


Gly 
155 


Pro 


Leu 


Val 


Thr 















<210> 


25 


<211> 


164 


<212> 


PRT 


<213> 


Homo Sctpi ens 


<220> 




<221> 


DOMAIN 


<223> 


protease domain of Hepsin (Heps) 


<400> 


25 



Asp 


Trp 


Val 


Leu 


Thr 


Ala 


Ala 


His 


Cys 


Phe 


Pro 


Glu 


Arg 


Asn 


Arg 






5 










10 










15 


Val 


Leu 


Ser 


Arg 


Trp 


Arg 


Val 


Phe 


Ala 


Gly 


Ala 


Val 


Ala 


Gin 


Ala 








20 










25 










30 


Ser 


Pro 


His 


Gly 


Leu 


Gin 


Leu 


Gly Val 


Gin 


Ala 


Val 


Val 


Tyr 


His 








35 










40 










45 


Gly 


Gly 


Tyr 


Leu 


Pro 


Phe 


Arg 


Asp 


Pro 


Asn 


Ser 


Glu 


Glu 


Asn 


Ser 






50 










55 










60 


Asn 


Asp 


He 


Ala 


Leu 


Val 


His 


Leu 


Ser 


Ser 


Pro 


Leu 


Pro 


Leu 


Thr 








65 










70 










75 


Glu 


Tyr 


He 


Gin 


Pro 


Val 


Cys 


Leu 


Pro 


Ala 


Ala 


Gly 


Gin 


Ala 


Leu 








80 










85 










90 


Val 


Asp 


Gly 


Lys 


He 
95 


Cys 


Thr 


Val 


Thr 


Gly 
100 


Trp 


Gly 


Asn 


Thr 


Gin 
105 


Tyr 


Tyr 


Gly 


Gin 


Gin 


Ala 


Gly 


Val 


Leu 


Gin 


Glu 


Ala 


Arg 


Val 


Pro 








110 










115 










120 


He 


He 


Ser 


Asn 


Asp 


Val 


Cys 


Asn 


Gly 


Ala 


Asp 


Phe 


Tyr 


Gly Asn 



SEQ 13/41 



wo 00/52044 



PCT/USOO/05612 



125 130 135 

Gin lie Lys Pro Lys Met Phe Cys Ala Gly Tyr Pro Glu Gly Gly 

140 145 150 

lie Asp Ala Cys Gin Gly Asp Ser Gly Gly Pro Phe Val Cys 

155 160 



<210> 
<211> 
<212> 
<213> 
<220> 
<221> 
<222> 
<223> 



<400> 



26 
23 
DNA 

Artificial sequence 

priiner_bind 

6, 9. 12, 15, 18 

forward redundant primer for the consensus 
sequences of amino acids surrounding the catalytic 
triad for serine proteases, n = inosine 
26 



tgggtngtna cngcngcnca ytg 



23 



<210> 


27 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


primer_bind 


<222> 


3, 6, 9, 12, 15, 18 


<223> 


reverse redundant primer 



<400> 



for the consensus 
sequences of amino acids surrounding the catalytic 
triad for serine proteases, n = inosine 
27 



arnarngcna tntcnttncc 



20 



<210> 


28 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


primer_bind 


<223> 


forward oligonucleotide primer 
used for quantitative PCR 


<400> 


28 



gaaacatgtc cttgctctcg 



20 



<210> 
<211> 
<212> 
<213> 
<220> 
<221> 
<223> 

<400> 



29 
20 
DNA 

Artificial sequence 
primer_bind 

reverse oligonucleotide primer for TADG-12 

used for quantitative PCR 

29 



SEQ 14/41 



wo 00/52044 



PCT/USOO/05612 



actaacttcc acagcctcct 20 



<^ 1U> 


J U 


<211> 


20 


<212> 


DNA 


<213> 


Artificial sequence 


<220> 




<221> 


priiner_bind 


<223> 


forward oligonucleotide 




varicint (TADG-12V) used 


<400> 


30 



tccaggtggg tctagtttcc 20 

<210> 31 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

<221> primer_bind 

<223> reverse oligonucleotide primer for TADG-12 

variant (TADG-12V) used for quantitative PGR 
<400> 31 

ctctttggct tgtacttgct 20 

<210> 32 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

<221> primer_bind 

<223> forward oligonucleotide primer for p-tubulin 

used as an internal control for quantitative PGR 
<400> 32 

cgcatcaacg tgtactacaa 20 

<210> 33 
<211> 20 
<212> DNA 

<213> Artificial sequence 

<220> 

< 2 2 1 > pr imer_bind 

<223> reverse oligonucleotide primer for P-tubulin 

used as an internal control for quantitative PGR 
<400> 33 

tacgagctgg tggactgaga 20 

<210> 34 

<211> 12 

<212> PRT 

<213> Artificial sequence 
<220> 

<223> a poly- lysine linked multiple antigen peptide 



SEQ 15/41 



WOOO/52044 



PCT/USOO/05612 



derived from the TADG-12 carboxy- terminal protein 
sequence, present in full length TADG-12, but not 
in TADG-12V 
<400> 34 

Trp lie His Glu Gin Met Glu Arg Asp Leu Lys Thr 

5 10 

<210> 35 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 40 , . .48 

<223> TADG-12 peptide 

<400> 35 

lie Leu Ser Leu Leu Pro Phe Glu Val 





5 




<210> 


36 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


144 . . 


.152 


<223> 


TADG- 


12 peptide 


<400> 


36 





Ala Gin Leu Gly Phe Pro Ser Tyr Val 





5 




<210> 


37 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


225 . . 


.233 


<223> 


TADG- 


12 peptide 


<400> 


37 





Leu Leu Ser Gin Trp Pro Trp Gin Ala 

5 

<210> 38 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 252 . . -260 

<223> TADG-12 peptide 

<400> 38 

Trp lie lie Thr Ala Ala His Cys Val 

5 



SEQ 16/41 



wo 00/52044 



<210> 39 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 356 . • .364 

<223> TADG-12 peptide 

<400> 39 

Val Leu Asn His Ala Ala Val Pro Leu 

5 

<210> 40 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 176 . . .184 

<223> TADG-12 peptide 

<400> 40 

Leu Leu Pro Asp Asp Lys Val Thr Ala 

5 

<210> 41 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 13 ... 21 

<223> TADG-12 peptide 

<400> 41 

Phe Ser Phe Arg Ser Leu Phe Gly Leu 

5 

<210> 42 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 151. . .159 

<223> TADG-12 peptide 

<400> 42 

Tyr Val Ser Ser Asp Asn Leu Arg Val 

5 

<210> 43 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 436 . , .444 

<223> TADG-12 peptide 

<400> 43 



SEQ 17/41 



wo 00/52044 



Arg Val Thr Ser Phe Leu Asp Trp lie 

5 



<210> 44 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 234. . .242 

<223> TADG-12 peptide 

<400> 44 



Ser Leu Gin Phe Gin Gly Tyr His Leu 

5 



<210> 45 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 181. . .189 

<223> TADG-12 peptide 

<400> 45 



Lys Val Thr Ala Leu His His Ser Val 

5 



<210> 46 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 183 . . . 191 

<223> TADG-12 peptide 

<400> 46 



Thr Ala Leu His His Ser Val Tyr Val 

5 



<210> 47 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 411, . .419 

<223> TADG-12 peptide 

<400> 47 



Arg Leu Trp Lys Leu Val Gly Ala Thr 

5 



<210> 48 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 18/41 



wo 00/52044 



PCT/USOO/05612 



<220> 

<222> 60. , .68 

<223> TADG-12 peptide 

<400> 48 



Leu lie Leu Ala Leu Ala lie Gly Leu 

5 



<210> 49 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 227 . . .235 

<223> TADG-12 peptide 

<400> 49 



Ser Gin Trp Pro Trp Gin Ala Ser Leu 

5 



<210> 50 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 301. . .309 

<223> TADG-12 peptide 

<400> 50 



Arg Leu Gly Asn Asp lie Ala Leu Met 

5 



<210> 51 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 307 . . . 315 

<223> TADG-12 peptide 

<400> 51 



Ala Leu Met Lys Leu Ala Gly Pro Leu 

5 



<210> 52 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 262 . . .270 

<223> TADG-12 peptide 

<400> 52 



Asp Leu Tyr Leu Pro Lys Ser Trp Thr 

5 



SEQ 19/41 



wo 00/52044 



<210> 53 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 416. . .424 

<223> TADG-12 peptide 

<400> 53 

Leu Val Gly Ala Thr Ser Phe Gly lie 

5 

<210> 54 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 54. , ,62 

<223> TADG-12 peptide 

<400> 54 

Ser Leu Gly lie lie Ala Leu lie Leu 

5 

<210> 55 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 218. . .226 

<223> TADG-12 peptide 

<400> 55 

lie Val Gly Gly Asn Met Ser Leu Leu 

5 

<210> 56 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 35 . . .43 

<223> TADG-12 peptide 

<400> 56 

Ala Val Ala Ala Gin lie Leu Ser Leu 

5 

<210> 57 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 271 . . .279 

<223> TADG-12 peptide 

<400> 57 



SEQ 20/41 



wo 00/52044 



lie Gin Val Gly Leu Val Ser Leu Leu 

5 



<210> 58 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 397. . .405 

<223> TADG-12 peptide 

<400> 58 



Cys Gin Gly Asp Ser Gly Gly Pro Leu 

5 



<210> 59 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 270 . . .278 

<223> TADG-12 peptide 

<400> 59 



Thr lie Gin Val Gly Leu Val Ser Leu 

5 



<210> 60 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 56 ... 64 

<223> TADG-12 peptide 

<400> 60 



Gly lie lie Ala Leu lie Leu Ala Leu 

5 



<210> 61 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 110 . . , 118 

<223> TADG-12 peptide 

<400> 61 



Arg Val Gly Gly Gin Asn Ala Val Leu 

5 



<210> 62 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 21/41 



wo 00/52044 



PCT/USOO/05612 



<220> 

<222> 217 . . .225 

<223> TADG-12 peptide 

<400> 62 



Arg lie Val Gly Gly Asn Met Ser Leu 

5 



<210> 63 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 130 . . . 138 

<223> TADG-12 peptide 

<400> 63 



Cys Ser Asp Asp Trp Lys Gly His Tyr 

5 



<210> 64 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 8 . . • 16 

<223> TADG-12 peptide 

<400> 64 



Ala Val Glu Ala Pro Phe Ser Phe Arg 

5 



<210> 65 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 328 . . . 336 

<223> TADG-12 peptide 

<400> 65 



Asn Ser Glu Glu Asn Phe Pro Asp Gly 

5 



<210> 66 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 3 ... 11 

<223> TADG-12 peptide 

<400> 66 



Glu Asn Asp Pro Pro Ala Val Glu Ala 

5 



SEQ 22/41 



wo 00/52044 



<210> 67 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 98, . .106 

<223> TADG-12 peptide 

<400> 67 

Asp Cys Lys Asp Gly Glu Asp Glu Tyr 

5 

<210> 68 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 346 . . .354 

<223> TADG-12 peptide 

<400> 68 

Ala Thr Glu Asp Gly Gly Asp Ala Ser 

5 

<210> 69 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 360 ... 368 

<223> TADG-12 peptide 

<400> 69 

Ala Ala Val Pro Leu lie Ser Asn Lys 

5 

<210> 70 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 153 . . . 161 

<223> TADG-12 peptide 

<400> 70 

Ser Ser Asp Asn Leu Arg Val Ser Ser 

5 

<210> 71 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 182 . . . 190 

<223> TADG-12 peptide 

<400> 71 



SEQ 23/41 



wo 00/52044 



Val Thr Ala Leu His His Ser Val Tyr 

5 



<210> 72 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 143 . . , 151 

<223> TADG-12 peptide 

<400> 72 



Cys Ala Gin Leu Gly Phe Pro Ser Tyr 

5 



<210> 73 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 259 . . .267 

<223> TADG-12 peptide 

<400> 73 



Cys Val Tyr Asp Leu Tyr Leu Pro Lys 

5 



<210> 74 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 369 . . .377 

<223> TADG-12 peptide 

<400> 74 



lie Cys Asn His Arg Asp Val Tyr Gly 

5 



<210> 75 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 278. . .286 

<223> TADG-12 peptide 

<400> 75 



Leu Leu Asp Asn Pro Ala Pro Ser His 

5 



<210> 76 

<211> 9 

<212> PRT 

<213> Homo sapiens 



SEQ 24/41 



wo 00/52044 



<220> 

<222> 426 . . .434 

<223> TADG-12 peptide 

<400> 76 



Cys Ala Glu Val Asn Lys Pro Gly Val 

5 



<210> 


77 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


3 2 • • • 


40 


<223> 


TADG- 


12 peptide 


<400> 


77 





Asp Ala Asp Ala Val Ala Ala Gin lie 

5 



<210> 78 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 406 . . .414 

<223> TADG-12 peptide 

<400> 78 



Val Cys Gin Glu Arg Arg Leu Trp Lys 

5 



<210> 


79 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


329. . 


.337 


<223> 


TADG- 


12 peptide 


<400> 


79 





Ser Glu Glu Asn Phe Pro Asp Gly Lys 

5 



<210> 


80 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


303 . . 


.311 


<223> 


TADG- 


12 peptide 


<400> 


80 





Gly Asn Asp lie Ala Leu Met Lys Leu 

5 



SEQ 25/41 



wo 00/52044 



<210> 81 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 127. , ,135 

<223> TADG-12 peptide 

<400> 81 

Lys Thr Met Cys Ser Asp Asp Trp Lys 

5 

<210> 82 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 440 . . .448 

<223> TADG-12 peptide 

<400> 82 

Phe Leu Asp Trp lie His Glu Gin Met 

5 

<210> 83 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 433 . . .441 

<223> TADG-12 peptide 

<400> 83 

Val Tyr Thr Arg Val Thr Ser Phe Leu 

5 

<210> 84 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 263 . . . 271 

<223> TADG-12 peptide 

<400> 84 

Leu Tyr Leu Pro Lys Ser Trp Thr lie 

• 5 

<210> 85 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 169 . . , 177 

<223> TADG-12 peptide 

<400> 85 



SEQ 26/41 



wo 00/52044 



Glu Phe Val Ser lie Asp His Leu Leu 

5 

<210> 86 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 296 . . .304 

<223> TADG-12 peptide 

<400> 86 

Lys Tyx Lys Pro Lys Arg Leu Gly Asn 

5 

<210> 87 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 16 . , .24 

<223> TADG-12 peptide 

<400> 87 

Arg Ser Leu Phe Gly Leu Asp Asp Leu 

5 

<210> 88 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 267 . . .275 

<223> TADG-12 peptide 

<400> 8 8 

Lys Ser Trp Thr lie Gin Val Gly Leu 

5 

<210> 89 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 81. . .89 

<223> TADG-12 peptide 

<400> 89 

Arg Ser Ser Phe Lys Cys lie Glu Leu 

5 



<210> 90 
<211> 9 
<212> PRT 



SEQ 27/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 375. . .383 

<223> TADG-12 peptide 

<400> 90 



Val Tyr Gly Gly lie lie Ser Pro Ser 

5 



<210> 91 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 110. . .118 

<223> TADG-12 peptide 

<400> 91 



Arg Val Gly Gly Gin Asn Ala Val Leu 

5 



<210> 92 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 189. . .197 

<223> TADG-12 peptide 

<400> 92 



Val Tyr Val Arg Glu Gly Cys Ala Ser 

5 



<210> 93 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 165... 173 

<223> TADG-12 peptide 

<400> 93 



Gin Phe Arg Glu Glu Phe Val Ser lie 

5 



<210> 94 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 10. . .18 

<223> TADG-12 peptide 

<400> 94 



Glu Ala Pro Phe Ser Phe Arg Ser Leu 

5 



SEQ 28/41 



wo 00/52044 



<210> 95 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 407 . . .415 

<223> TADG-12 peptide 

<400> 95 



Cys Gin Glu Arg Arg Leu Trp Lys Leu 

5 



<210> 96 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 381. . .389 

<223> TADG-12 peptide 

<400> 96 



Ser Pro Ser Met Leu Cys Ala Gly Tyr 

5 



<210> 97 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 375. . ,383 

<223> TADG-12 peptide 

<400> 97 



Val Tyr Gly Gly lie lie Ser Pro Ser 

5 



<210> 98 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 3 81 . . .3 89 

<223> TADG-12 peptide 

<400> 98 



Ser Pro Ser Met Leu Cys Ala Gly Tyr 

5 



<210> 99 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 362 . . .370 

<223> Ti\DG-12 peptide 



SEQ 29/41 



wo 00/52044 



PCT/USOO/05612 



<400> 99 

Val Pro Leu lie Ser Asn Lys lie Cys 

5 

<210> 100 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 373 . . .381 

<223> TADG-12 peptide 

<400> 100 

Arg Asp Val Tyr Gly Gly lie lie Ser 

5 

<210> 101 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 283 . . .291 

<223> TADG-12 peptide 

<400> 101 

Ala Pro Ser His Leu Val Glu Lys lie 

5 

<210> 102 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 177 . , . 185 

<223> TADG-12 peptide 

<400> 102 

Leu Pro Asp Asp Lys Val Thr Ala Leu 

5 

<210> 103 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 47 ... 55 

<223> TADG-12 peptide 

<400> 103 

Glu Val Phe Ser Gin Ser Ser Ser Leu 

5 

<210> 104 

<211> 9 

<212> PRT 



SEQ 30/41 



wo 00/52044 



PCT/USOO/05612 



<213> Homo sapiens 
<220> 

<222> 36. . .44 

<223> TADG-12 peptide 

<400> 104 

Val Ala Ala Gin lie Leu Ser Leu Leu 

5 

<210> 105 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 255 . . .263 

<223> TADG-12 peptide 

<400> 105 

Thr Ala Ala His Cys Val Tyr Asp Leu 

5 

<210> 106 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 138. . .146 

<223> TADG-12 peptide 

<400> 106 

Tyr Ala Asn Val Ala Cys Ala Gin Leu 

5 

<210> 107 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 195 . . .203 

<223> TADG-12 peptide 

<400> 107 

Cys Ala Ser Gly His Val Val Thr Leu 

5 

<210> 108 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 215 . . .223 

<223> TADG-12 peptide 

<400> 108 

Ser Ser Arg lie Val Gly Gly Asn Met 

5 



SEQ 31/41 



wo 00/52044 



<210> 109 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 298. . .306 

<223> TADG-12 peptide 

<400> 109 



Lys Pro Lys Arg Leu Gly Asn Asp lie 

5 



<210> 110 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 313 . . .321 

<223> TADG-12 peptide 

<400> 110 



Gly Pro Leu Thr Phe Asn Glu Met lie 

5 



<210> ill 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 108 . . . 116 

<223> TADG-12 peptide 

<400> 111 



Cys Val Arg Val Gly Gly Gin Asn Ala 

5 



<210> 112 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 294. . .302 

<223> TADG-12 peptide 

<400> 112 



His Ser Lys Tyr Lys Pro Lys Arg Leu 

5 

<210> 113 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 265 . . .273 

<223> TADG-12 peptide 



SEQ 32/41 



wo 00/52044 ^ PCT/USOO/05612 



<400> 113 

Leu Pro Lys Ser Trp Thr lie Gin Val 

5 

<210> 114 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 88 ... 96 

<223> TADG-12 peptide 

<400> 114 

Glu Leu lie Thr Arg Cys Asp Gly Val 

5 

<210> 115 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 79. . .87 

<223> TADG-12 peptide 

<400> 115 

Arg Cys Arg Ser Ser Phe Lys Cys lie 

5 

<210> 116 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 255 , . .263 

<223> TADG-12 peptide 

<400> 116 

Thr Ala Ala His Cys Val Tyr Asp Leu 

5 

<210> 117 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 207 . . .215 

<223> TADG-12 peptide 

<400> 117 

Ala Cys Gly His Arg Arg Gly Tyr Ser 

5 

<210> 118 
<211> 9 
<212> PRT 



SEQ 33/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 154 . . . 162 

<223> TADG-12 peptide 

<400> 118 



Ser Asp Asn Leu Arg Val Ser Ser Leu 

5 



<210> 119 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 300 . . .308 

<223> TADG-12 peptide 

<400> 119 



Lys Arg Leu Gly Asn Asp lie Ala Leu 

5 



<210> 120 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 435 . . .443 

<223> TADG-12 peptide 

<400> 120 



Thr Arg Val Thr Ser Phe Leu Asp Trp 

5 



<210> 121 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 376 , . .384 

<223> TADG-12 peptide 

<400> 121 



Tyr Gly Gly lie lie Ser Pro Ser Met 

5 



<210> 122 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 410 . . .418 

<223> TADG-12 peptide 

<400> 122 



Arg Arg Leu Trp Lys Leu Val Gly Ala 

5 



SEQ 34/41 



wo 00/52044 



<210> 123 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 210. , .218 

<223> TADG-12 peptide 

<400> 123 



His Arg Arg Gly Tyr Ser Ser Arg lie 

5 



<210> 124 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 109 . . . 117 

<223> TADG-12 peptide 

<400> 124 



Val Arg Val Gly Gly Gin Asn Ala Val 

5 



<210> 125 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 191. . .199 

<223> TADG-12 peptide 

<400> 125 

Val Arg Glu Gly Cys Ala Ser Gly His 

5 



<210> 126 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 78 ... 86 

<223> TADG-12 peptide 

<400> 126 



Tyr Arg Cys Arg Ser Ser Phe Lys Cys 

5 



<210> 127 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 113 . . .121 
<223> - TADG-12 peptide 



SEQ 35/41 




wo 00/52044 PCT/USOO/05612 

<400> 127 

Gly Gin Asn Ala Val Leu Gin Val Phe 

5 

<210> 128 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 91, . .99 

<223> TAIX3-12 peptide 

<400> 128 

Thr Arg Cys Asp Gly Val Ser Asp Cys 

5 

<210> 129 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 38 ... 46 

<223> TADG-12 peptide 

<400> 129 

Ala Gin lie Leu Ser Leu Leu Pro Phe 

5 

<210> 130 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 211. . .219 

<223> TADG-12 peptide 

<400> 130 

Arg Arg Gly Tyr Ser Ser Arg lie Val 

5 

<210> 131 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 216 . . .224 

<223> TADG-12 peptide 

<400> 131 

Ser Arg lie Val Gly Gly Asn Met Ser 

5 

<210> 132 

<211> 9 

<212> PRT 



SEQ 36/41 




wo 00/52044 — PCT/USOO/05612 

<213> Homo sapiens 
<220> 

<222> 118, . .126 

<223> TADG-12 peptide 

<400> 132 

Leu Gin Val Phe Thr Ala Ala Ser Trp 

5 

<210> 133 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 370. . .378 

<223> TADG-12 peptide 

<400> 133 

Cys Asn His Arg Asp Val Tyr Gly Gly 

5 

<210> 134 

<211> 9 

<212> PRT 

< 2 1 3 > Homo sapi ens 

<220> 

<222> 393... 401 

<223> TADG-12 peptide 

<400> 134 

Gly Val Asp Ser Cys Gin Gly Asp Ser 

5 

<210> 135 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 235. . .243 

<223> TADG-12 peptide 

<400> 135 

Leu Gin Phe Gin Gly Tyr His Leu Cys . 

5 

<210> 136 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 427, . .435 

<223> TADG-12 peptide 

<400> 136 

Ala Glu Val Asn Lys Pro Gly Val Tyr 

5 



SEQ 37/41 



wo 00/52044 



<210> 137 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 162 • . ,170 

<223> TADG-12 peptide 

<400> 137 



Leu Glu Gly Gin Phe Arg Glu Glu Phe 

5 



<210> 138 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 9 ... 17 

<223> TADG-12 peptide 

<400> 138 



Val Glu Ala Pro Phe Ser Phe Arg Ser 

5 



<210> 139 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 318. . .326 

<223> TADG-12 peptide 

<400> 139 

Asn Glu Met lie Gin Pro Val Cys Leu 

5 



<210> 140 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 256 . . .264 

<223> TADG-12 peptide 

<400> 140 



Ala Ala His Cys Val Tyr Asp Leu Tyr 

5 



<210> 141 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 46 , . .54 

<223> TADG-12 peptide 



SEQ 38/41 




m 



WO 00/52044 PCTAJSOO/05612 

<400> 141 

Phe Glu Val Phe Ser Gin Ser Ser Ser 

5 

<210> 142 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 64 . . .72 

<223> TADG-12 peptide 

<400> 142 

Leu Ala lie Gly Leu Gly lie His Phe 

5 

<210> 143 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 192 . . .200 

<223> TADG-12 peptide 

<400> 143 



Arg Glu Gly Cys Ala Ser Gly His Val 

5 



<210> 144 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 330... 338 

<223> TADG-12 peptide 

<400> 144 



Glu Glu Asn Phe Pro Asp Gly Lys Val 

5 



<210> 145 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 182. . .190 

<223> TADG-12 peptide 

<400> 145 

Val Thr Ala Leu His His Ser Val Tyr 

5 



<210> 146 
<211> 9 
<212> PRT 



SEQ 39/41 



wo 00/52044 



<213> Homo sapiens 
<220> 

<222> 408. . .416 

<223> TADG-12 peptide 

<400> 146 



Gin Glu Arg Arg Leu Trp Lys Leu Val 

5 



<210> 


147 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<22.0> 






<222> 


206 . . 


.214 


<223> 


TADG- 


12 peptide 


<400> 


147 





Thr Ala Cys Gly His Arg Arg Gly Tyr 

5 



<210> 


148 


<211> 


9 


<212> 


PRT 


<213> 


Homo sapiens 


<220> 




<222> 


5. . .13 


<223> 


TADG-12 peptide 


<400> 


148 



Asp Pro Pro Ala Val Glu Ala Pro Phe 

5 



<210> 


149 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


261. . 


.269 


<223> 


TADG- 


12 peptide 


<400> 


149 





Tyr Asp Leu Tyr Leu Pro Lys Ser Trp 

5 



<210> 


150 




<211> 


9 




<212> 


PRT 




<213> 


Homo 


sapiens 


<220> 






<222> 


• • • 


41 


<223> 


TADG- 


12 peptide 


<400> 


150 





Ala Asp Ala Val Ala Ala Gin lie Leu 

5 



SEQ 40/41 



0 .w 



wo 00/52044 ^ PCT/USOO/05612 



<210> 151 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 168 . . .176 

<223> TADG-12 peptide 

<400> 151 



Glu Glu Phe Val Ser lie Asp His Leu 

5 



<210> 152 

<211> 9 

<212> PRT 

< 2 1 3 > Homo Scipi ens 

<220> 

<222> 304 . . .312 

<223> TADG-12 peptide 

<400> 152 



Asn Asp lie Ala Leu Met Lys Leu Ala 

5 



<210> 153 

<211> 9 

<212> PRT 

<213> Homo sapiens 

<220> 

<222> 104. . .112 

<223> TADG-12 peptide 

<400> 153 



Asp Glu Tyr Arg Cys Val Arg Val Gly 

5 



SEQ 41/41 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/USOO/05612 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC(7) :C07K 14/435» 14/705; A61K 38/03» 38/08. 38/17 
US CL :530/350; 514/2 
According to International Patent Classification (IPC) or to both national classification and IPC 




Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 530/350; 514/2 


Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 


Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
protein and nucleic acids databases 


C. DOCUMENTS CONSIDERED TO BE RELEVANT 


Category* 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


X 


TANIMOTO et al. Cloning and expression of TADG-15, a novel 
serine: protease expressed in ovarian cancer. Proceedings of the 
American Association for Cancer Research. March 1998, Vol. 39, 
page 648, abstract #4414, see entire abstract. 


1, 12, 18-21 
23 


X 


O'BRIEN et al. Cloning and expression TADG-15, a novel serine 
protease expressed in ovarian cancer. Tumor Biology. 1998, 
Supplement 2, page 33, abstract 0-42, see entire abstract. 


I, 12, 18-21, 23 


X 


WO 98/41656 Al (THE BOARD OF TRUSTEES OF THE 
UNIVERSITY OF ARKANSAS) 24 September 1998, claim 5, page 
8. 


22 


X, P 


Database Genecore version 4.5. Accession number AW1041 13, NCI- 
CGAP, 'National Cancer Institute, Cancer Genome Anatomy Project 
(CGAP), Tumor Gene Index,' sequence listing, 20 October 1999, 
see sequence listing. 


1, 2 


1 1 Further documents are listed in the continuation of Box C. | | Sec patent family annex. 


* Special categories of cited documents: 

'A* document deltning tlie general state of the art which is not considered 
to be of particular relevance 

'E" earlier document published on or after the international filing date 

"L" document which may tlirow doubts on priority claim(s) or which is 
cited to establish tlte publication dale of anotiier citation or other 
special reason (as specified) 

'O' document referring to an oral disclosure, use. exhibition or other 
means 

"P' document published prior to tjie international filing date but later tJian 
tlie priority date claimed 


*T* later document published after the international filing date or priority 
date and not in conflict with the application but cited to understand 
the principle or theory underlying (he invention 

*X' document of particular relevance: iJie claimed invention cannot be 
considered novel or caimot be considered to involve an inventive step 
when the document is taken alone 

'Y* document of particular relevance; Ute claimed invention cannot be 
considered lo involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to a person skilled in the art 

'A." document member of tlie same patent family 


Date of the actual completion of the international search 
04 JUNE 2000 


Dale of mailing of the international search report 

06 JUL2000 


Name and mailing address of the ISA/US 
Commissioner of Patents and TrademoHcs 
Box PCT 

Washington, D.C. 20231 
Facsimile No. (703) 305-3230 


Authorized officer 

KARPN A f-AWPI I A ^1^1 


Telephone No. (703) 308- 12M^ 



Form PCT/ISA/210 (second sheet) (July 1998)* 





Exhibit 26 



Thx Journal or Biological CHEMiaTBY 
C 1093 by Tb« Ameriam Socaeity for Biocbemistiy and Mo: 



le^^^^Logy. Inc. 



VoL 



lasoa of September 6. pp. 19101-19109. 1993 

Printed in U.S.A. 



Role of NH2-terminal Positively Charged Residues in Establishing 
Membrane Protein Topology* 



(Received for publication, April 14, 1993, and in revised form. May 21, 1993) 



Griffith D. Parks^ and Robert A. Lambfi 

From the Howard Hughes Medical InatUute and Department of Biochemistry, Molecuiar Biology and Cell Biology, Northwestern 
University, Evanston, IlUnois 60208-3500 



The paramyxovirus HN polyjieptide is a model type 
n membrane protein, containing an internal uncleaved 
signal/anchor (S/A) and is oriented in the membrane 
with an NHs-terminal c3rtoplasmic domain and COOH- 
terminal ectodomain (N«yt topology). To test the role of 
NHa-terminal positively charged residues in directing 
the HN membrane topology, the 3 arginine (Arg) resi- 
dues within the 17-amino-acid NHa*terminal domain 
were systematically converted to a glutamine or glu- 
tamate, and the topology of the mutant proteins was 
examined after expression in CV-1 (sells. The data 
indicate that: (i) each of the NHa-terminal Arg residues 
contributes to the signal directing proper HN topology, 
since substitutions in any of the three positions resulted 
in ~ 13-23% inversion into the Ne^ form; (ii) substitu- 
tions in the Arg directly flanking the signal/anchor 
domain resulted in slightly more inversion than those 
which were located more distally; and (iii) substitution 
with a negatively charged glutamate led to more in- 
version than did replacement with an uncharged glu- 
tamine. The effect of a single Arg to Glu substitution 
on the HN topology was enhanced when present in the 
context of a truncated NHa-terminal cytoplasmic tail 
(3 residues). A comparison of the sequences flanking 
the signal/anchor of well documented tyj>e HI proteins 
showed that the majority of these proteins contain a 
negatively charged residue flanking the NHa-terminal 
side. An exception to this rule is the NB protein which 
contains a single positively charged Arg residue in this 
position. A chimeric protein containing the NB ecto- 
domain and the HN S/A and EIN ectodomain lead to a 
signiHcant fraction (70%) of the chimeric protein ad- 
opting tsrpe n topology suggesting that the positive 
charge flanking the S/A domain is important for estab- 
lishing type n topology. These data are discussed in 
the context of the loop model for the biogenesis of 
integral membrane proteins and the possible signals 
necessary for establishing differing orientations. 



The ability of an integral membrane protein to function 
properly depends on the precise targeting of the cytoplasmic 

* The costs of publication of this article were defrayed in part by 
the payment of page charges. This article must therefore be hereby 
marked "advertisement'* in accordance with 18 U.S*C. Section 1734 
solely to indicate this fact. 

X Associate of the Howard Hughea Medical Institute. Present ad- 
dress: Dept. of Microbiology and Immunology, Bowman Gray School 
of Medicine of Wake Forest University, Winston-Salem, NC 27157- 
1064. 

§ Investigator of the Howard Hughes Medical Institute. To whom 
correspondence should be addressed: Dept. of Biochemistry, Molec- 
ular Biology and Cell Biology, 2153 Sheridan Rd., Evanston, IL 
00208-3600. Tel.: 708-491-5433; Pax: 708-491-2467. 



and extracellular domains of the polypeptide to the correct 
side of the membrane. The signals directing a protein into a 
characteristic membrane topology are contained within the 
amino acid sequence of the polypeptide (Blobel, 1980) and 
must be very precise as it appears that all naturally occurring 
membrane proteins adopt only a single final orientation. The 
majority of known membrane proteins which span the lipid 
bilayer a single time are classi&ed as type I proteins (nomen- 
clature of von Heijne and Gavel, 1988), based on the presence 
of both an NH^-terminal cleavable signal sequence which 
targets the nascent polypeptide to the ER^ membrane through 
an interaction with the signal recognition particle (SRP; 
Walter and Lingappa, 1986) and a separate COOH-terminal 
hydrophobic domain which acts as a stop transfer domain 
(membrane anchor). These proteins have an extracellular 
NH2-terminal domain and a cytoplasmic COOH-terminal tail 
(Noxo topology), A second class of membrane proteins has 
been found, with fewer known members than the type I 
membrane proteins, in which the proteins adopt the opposite 
orientation and have an NHa-terminal cytoplasmic tail and a 
COOH-terminal ectodomain (Ncyt topology). These type II 
proteins lack an NH2-terminal cleavable signal sequence, but 
contain an internal hydrophobic signal/anchor (S/A) which 
serves a dual function: the signaling of the nascent polypep- 
tide to the ER membrane and the subsequent anchoring of 
the polypeptide in the lipid bilayer. Examples of type II 
proteins include the transferrin receptor (Schneider et aL, 
1984), asialoglycoprotein receptor (Spiess and Lodish, 1986), 
the family of Golgi-resident glycosyltransferases (Patdson and 
Colley, 1989), and the paramyxovirus HN protein (Hiebert et 
a/., 1985). The least common class of membrane proteins that 
span the lipid bilayer a single time are the type III proteins 
which also contain an internal uncleaved S/A, but these 
proteins have an extracellular NHa-terminal domain and are 
in the Nexo orientation. Examples of type III proteins include 
the cytochrome P-450 proteins (Nelson and Strobel, 1988), 
the erythrocyte sialoglycoprotein 0 (High and Tanner, 1987), 
and the influenza A virus M2 protein and influenza B virus 
NB protein (Lamb et oi., 1986; Williams and Lamb, 1986). 

In contrast to the cleavable signal sequences of the type I 
membrane proteins which have been analyzed in detcul both 
by amino acid comparison (von Heijne, 1984, 1985) and ex- 
perimentally (e,g. Nothwehr and Gordon, 1989), relatively 
little is known about the structural features which distinguish 
the two types of membrane proteins with internal uncleaved 
S/A sequences. The type II and HI proteins both appear to 
use the same SRP-mediated mechanism for targeting to the 
ER membrane (Lipp and Dobberetein, 1986b; Hull et oi., 
1988). However, the signals which direct the steps following 

^The abbreviations used are: ER, endoplasmic reticulum; SRP, 
signal recognition particle; N-glycanase; peptide:re-glyco8idiase F; 
PAGE, polyacrylamide gel electrophoresis. 



19101 




m 



19102 ChargeUKesidues Direct Membrane Protein Topol 



A 

%N«o 

I I is/A 



B 



WT* MVNATEDA PVRATCRV LFR 



10^ 



12^ 



14* e KWsSNXN 23 

15* Q KNNNNNNN 14 



23 



18 



16* E 13 

17* Q t^XVXXVXXl 13 



18* E E KXXXWXSJ 56 

19* Q E NXVWXXS 42 



O 

o 

20* E E NVvXSNVi 48 I" 

&} 

(D 

21* Q ECi^iiSSSS 39 S; 

o 
3 

22* E E LNXXNXXVi 44 % 



cr 

23* Q E fcXWSXXX) 36 P 

(3 

24* E E E rwvxxv\>< 80 ^ 

O 

25* Q E P. OvVsNNNNl 76 §■ 

CD 



WT* 10' 12' 14* 15' 16* 17- 18* 19' 20' 21* 22' 23' 24* 25 



o 
o 



^mg^ ^^^^ ^nn^ 

c y t ^ ^^BBH^f ^tWBP^ '^IHPIIIr 



N 



exo 







Fig. 1. Structure and expression of HN' arginine substitution mutants. A, schematic diagram of Arg substitution mutents The 
amino acid sequence of the NH.-terminal domain of HN WT' is shown in the one tetter code with the HN «l8na /anchor ^^^^^^^^ 
depicted as a hatched box. A solid horizontal line denotes sequence identity to WT* with glutamate (£) or glutam.ne (Q) substitutions shown 



Charge^^Ssidues Direct Membrane Protein TopolOi 



19103 



this interaction of the S/A with SRP and lead to exclusively 
the Nto> or Net topology have not been determined. Hydro- 
phobicity appears to be the only structural requirement for 
an uncleaved S/A to function in the targeting and anchoring 
of a polypeptide (Audigier et al, 1987; Parks et ol., 1989; Zerial 
et ait 1987). As such, the analysis of topogenic sequences of 
type 11 and III proteins has focused on residues flanking the 
S/A domain, and it has been shown that these two types of 
proteins can be inverted in the membrane by complete ex- 
change of NHa- or COOH-terminal S/A-flanking regions 
(Haeuptle et oL, 1989; Parks et oL, 1989; Parks and Lamb, 
1991). On the basis of a theoretical analysis, based on amino 
acid sequences available from databases and examining amino 
acid sequences flanking S/A domains, two different hy- 
potheses have been proposed to explain the orientation of 
type II and III integral membrane proteins, (a) The "charge 
difference" rule (Hartmann et aL, 1989) proposed that when 
the differences in the sum of positive and negative charges 
within 15 residues of the NH2- and COOH-terminal sides of 
the S/A domain was calculated, the more positive side was 
cytoplasmic, in the manner of a dipole moment. (6) The 
"positive inside" rule (von Heijne. 1986; von Heijne and Gavel, 
1988) proposed that the topology of the protein is governed 
by positive charges alone, and the domain containing the most 
positive charges is cytoplasmic. However, in the case of two 
different type II proteins, data obtained from a systematic 
mutational analysis did not support either the charge differ- 
ence rule or the positive inside rule (Beltzer et ai., 1991; Parks 
and Lamb, 1991). The experimental data indicated that pos- 
itive charges in the NHa-terminal domain of type II proteins 
play a pivotal role in directing the Ncyi topology, since it has 
been shown that the removal of positive chcurges from the 
NHa-terminal S/A-flanking region leads to inversion of type 
II proteins into the N„o orientation, while the addition of 
positive charges to the COOH-terminal S/A-flanking region 
alone has little effect on topology (Beltzer et oL, 1991; Parks 
and Lamb, 1991). 

In an analysis of charge-altered HN mutants (Parks and 
Lamb, 1991), it was proposed that the HN orientation signal 
is comix>sed at least in part by a positively charged residue 
directly flanking the NH2-terminal side of the S/A. However, 
the potential role of positively charged residues located more 
distal to the S/A was not tested, and it has been postulated 
that these residues may also contribute to the orientation 
signal (High and Dobberstein, 1992). Here we report a system- 
atic mutational analysis of the NHa-terminal positively 
charged residues of the HN protein cytoplasmic tail and their 
effect on HN orientation. The data indicate that each of the 
3 NHa-terminal Arg residues contributes to the signal direct- 
ing the type II topology, since charge-altering mutations in 
these residues lead to polypeptides which can adopt the in- 
verted N«, orientation. The ability to invert the HN topology 
by these substitutions depends on the distance of the mutation 
from the S/A, as well as the charge of the substituting residue, 
cmd the effect of these alterations is enhanced when in the 
context of a truncated NHs-terminal domain. These results 
are discussed in a model for the topogenic signals of type I, 
II, and III proteins. 



MATERIALS AND METHODS 

Ceto— Monolayer cultures of CV-1 cells were grown in Dulbecco's 
modified Ease's medium containing 10% fetal calf serum as described 
(Lamb and Lai, 19S2). 

Plaamid Construction and Mutagenesis — To construct a pGEM3 
plasmtd containing a bacteriophage T7 RNA polymerase transcription 
terminator (pGem3-term), the appropriate 570-base pair fragment 
was excised from pGemex-2 (Promega, Madison. WI) by digestion 
with iVoel and Hindlll and inserted into the Noel and Hindlll sites 
of pGEM3. A cDNA clone of the SV5 HN protein gene (Hiebert et 
ol, 1985) was modified previously to encode the addition of a consen- 
sus site for iV-linked glycosylation (Asn-Ala-Thr) near the NHa 
terminus of the protein (HN*; Parks and Lamb, 1991), and a fragment 
from this clone (encoding residues 1-81) was used as a source of 
starting materials for oligonucleotide-directed mutagenesis after in- 
serting into a bacteriophage M13 vector as described (Parks et oL, 
1989). Likewise, a cDNA clone encoding a deletion of 14 of 17 NHa- 
terminal residues (HNGl; Parks and Lamb, 1990) was used as starting 
material for the construction of mutants MVE and MVQ. Following 
mutagenesis, DNA fragments were excised from the repUcative form 
of M13 by digestion with ^RI and Pstl and linked to a DNA 
fragment encoding HN residues 82-565 in pGem3-term (Arg substi- 
tution mutants) or pGemll (MVR, MVE, and MVQ) such that 
mRNA sense transcripts could be produced using the T7 RNA polym- 
erase promoter. Nucleotide sequences were confirmed by dideoxynu- 
cleotide chain-terminating sequencing (Sanger et aL, 1977). 

To construct the gene encoding the chimeric protein NBHH. a 
cDNA fragment encoding a portion of the influenza virus B/Lee/40 
segment 6 gene (bases 1-58; Shaw et al, 1982) was fused to HN using 
standard polymerase chain reaction protocols to create the precise 
junction of the NB NHa-terminal domain and the HN S/A domain 
(Arg/Thr). The construction of the gene encoding the M^/HN chi- 
meric protein MgHH has been described previously (Parks et at, 
1989). 

Isotopic Labeling of Polypeptides, Immunoprecipitation, N-Glycan- 
ase Digestions, Protease Treatment of Microsomal Membranes, and 
Polyacrylamide Gel Electrophoresis — Proteins were expressed in CV- 
1 cells as described (Parks and Lamb, 1991) using a modified version 
of the vaccinia virus/Ti RNA polymerase system of Fuerst et al 
(1986). Vaccinia virus vTF7-3- infected cells were transfected with 
pGEM plasmid DNA encoding the HN mutants and radiolabeled 
from 3.5 to 4.5 h postinfection with 20-50 >iCi/ml Tran[**S] label 
(ICN Radiochemicals Inc., Irvine, CA) in Dulbeoco*s modified Eagle's 
medium lacking cysteine and methionine. Radiolabeled cells were 
washed in phosphate -buffered saline before lysis in 1% SDS. Immu- 
noprecipitation of proteins from cell extracts with antisera to dena- 
tured HN (HN antisera) was as described previously (Erickson and 
Blobel, 1979; Ng et at, 1990). Deglycosylation of proteins by treatment 
with peptideiiV-glycosidase F (N-glycanase) was carried out as de- 
scribed (Williams and Lamb, 1986). Microsomal membranes were 
prepared from vaccinia virus-infected cells by Dounce homogeniza- 
tion (Adams and Rose, 1985) and analyzed by trypsin digestion as 
described previously (Parks et oL, 1989). Samples were analyzed by 
SDS-PAGE on 10% polyacrylamide gels, followed by fluorography 
(Lamb and Choppin, 1976). Autoradiograms were quantitated using 
a Molecular Dynamics model 400 series Phosphorimager (Suimyvale, 
CA), and represent the average of at least two experiments. 

Nomenclature — The nomenclature for type I-III proteins follows 
that of von Heijne and Gavel (1988). For the purposes of discussion, 
the borders of the S/A are operationally defined as the first charged 
residues located on either side of the first hydrophobic membrane - 
spanning region. The HN Arg substitution mutants (Fig. 1) are 
denoted by a numbering system which is a continuation of that used 
previously (Parks and Lamb, 1991). The HN cytoplasmic domain 
mutants MVR, MVE, and MVQ are named for the 3 residues which 
comprise the tail of these proteins. Hybrid proteins NBHH and 
MgHH are denoted by letters which represent the origin of the NHj- 
terminal domain (NB or Mi), with the transmembrane domain and 
cytoplasmic domain being derived from HN (H). The Ms NH2- 



below their position in the HN NHa-terminal domain. The location of the NH^-terminal consensus site for NH2-linked glycosylation is 
highlighted by an asterisk. Vertical arrows indicate the location of the altered Arg residues. Nomenclature for the mutants is described in the 
text. Percent N«, values represent the average of at least two experiments. B, expression of Arg substitution mutants. CV-1 cells infected 
with vaccinia virus vTF7-3 were transfected with DNA plasmids encoding the Arg substitution mutants. Polypeptides were radiolabeled from 
3.6-4.6 h postinfection with Tran[^S] label, immunoprecipitated with HN antisera, and analyzed by SDS-PAGE. N^* and N«o denote 
polypeptides with the WT HN and inverted membrane orientations, respectively. 



19104 




Char^^^esidues Direct Membrane Protein Topo. 



m 



terminal domain used (Mg) contains a site for addition of ^-linked 
carbohydrate (Parks et oL, 1989). 

RESULTS 

Role of HN NH^'terminal Arg Residues in Topogenesis — To 
examine experimentally the role of NH2-termiiial positively 
charged residues in the cytoplasmic tail of a type II integral 
membrane protein in directing membrane topology, a series 
of charge-altered mutants was produced in which the 3 NH2- 
terminal Arg residues of HN were converted individually (Fig. 
LA, mutants 10*, 12*, 14*- 27*) or in combination (mutants 
18*-25*) to a negatively charged glutamate (E) or uncharged 
glutamine (Q). As a means of monitoring directly expression 
in the form, each of these mutants also contained a single 
site for the addition of an AT-Iinked carbohydrate residue 
which had been inserted near the HN NH2 terminus (HN*, 
Parks and Lamb, 1991). It was anticipated that glycosylation 
of the NHa-terminal domain of HN molecules inverted into 
the N.XO topology would result in a species with a slower 
electrophoretic mobility than that of unglycosylated HN and 
would allow for a distinction between molecules having the 
HN Ncyt orientation (four accessible COOH-terminal glyco- 
sylation sites), bone fide inversion into the N«o form (one 
accessible NHa-terminal glycosylation site), and unglycosyl- 
ated polypeptides which were defective in membrane target- 
ing. The HN mutants were expressed to high levels by first 
infecting CV-1 cells with a recombinant vaccinia virus which 
synthesizes T7 RNA polymerase (Fuerst etoL, 1986) and then 
transfecting the cells with DNA plasmids encoding the mu- 
tants under control of the T? promoter. After radiolabeling 
the cells with -labeled amino acids, (polypeptides were 
immunoprecipitated from cell extracts using HN antisera and 
examined by SDS-PAGE. 

As shown in Fig. IB, each of the charge-altered mutants 
was synthesized to varying degrees as a mixture of two major 
polypeptides: a species with an electrophoretic mobility 
closely matching that of HN WT* (N^yt) and a faster migrating 
species denoted as Nuo. The slight differences in the electro- 
phoretic mobilities of the mutant polypeptides most likely 
reflect aberrant migration due to their charge differences. 
With each mutant, a single species which migrated faster than 
the N«xo form was generated after removal of the carbohydrate 
residues by i^-glycanase treatment, and this indicates that 
the two electrophoretic species observed in Fig. IB are a single 
polypeptide chain backbone that differs by glycosylation (data 
not shown, but see Parks and Lamb, 1991). Trace amoimts of 
polypeptides which migrate faster than the Ne^o form are 
degradation products and have an electrophoretic mobility 
distinct from deglycosylated HN (data not shown). Pulse- 
labeling followed by chase experiments indicated that the N,^ 
and Neu> forms of mutant proteins were relatively stable (data 
not shown), and thus, a comparison of the fraction of each 
mutant found in the N^o form is a valid measure of the 
relative effect of each mutation on topogenesis. Quantitation 
of several experiments by Phosphorimager analysis of the Ncyt 
and N.V, species showed that 13-23% of each of the single 
Arg mutants was expressed in the inverted N«xo form (Fig. 
IB, left panel). 

When 2 of the 3 HN NH2-terminal cytoplasmic domain 
Arg residues were mutated (Fig. IB, middle panel, mutants 
18* -23*), significantly more of the HN protein was inverted 
in the membrane in comparison to the single Arg substitu- 
tions. Within each pair of mutants, the substitution of an Arg 
residue by a negatively charged Glu resulted in slightly more 
efficient expression in the Nuo form than when the Arg was 
replaced by an uncharged Gin residue (e.g. compare mutant 
16* with 19*). Furthermore, substitution of the Arg located 



closest to the S/A led to greater expression in the New, form 
than did substitution of Arg residues which were more distal 
to the S/A, and this is most clearly seen by comparison of 
mutants 18* (56% N,«) and 22* (44% Nexo). The largest 
inversion of the HN orientation was seen in the case of mutant 
24* in which all of the Arg residues had been converted to 
Glu, and '^80% of this protein was oriented in the N«xo form 
(Fig. IB, 24* lane). Taken together, these data suggest that 
substitution of each of the NH2-terminal Arg residues leads 
to inversion of the HN type II topology, but that the positions 
closest to the S/A are more sensitive to these charge altera- 
tions. 

To determine if a single Arg residue directly flanking the 
S/A was sufficient to direct the type II topology, a mutant 
HN* protein was constructed (Fig. 2, 26*) in which both Arg 
11 and 15 were converted to uncharged Gin residues, leaving 
oidy Arg 19 which directly flanks the S/A. When the HN 
mutant 26* was expressed in CV-1 cells by the vaccinia virus 
T7 RNA polymerase sj^tem described above, two major poly- 
peptides were detected (Fig. 2, — lane), and both of these 
forms had an electrophoretic mobility which was slower than 
the single polypeptide produced after removed of the carbo- 
hydrate residues by treatment with iV-glycanase (+ lane). 
Quantitation of the relative amounts of the two forms by 
Phosphorimager analysis showed that 25% of this protein was 
expressed in the N„o orientation. Although the ability of each 
of the other 2 Arg residues to direct the Ncyt orientation by 
themselves has not been tested, these data indicate that a 
single S/A-flanking positively charged residue is sufficient to 
direct 75% of the molecules into the type II topology. Fur- 
thermore, a comparison of the HN 26* mutant (25% Nexo) 
with the 22* mutant shown in Fig. \B (44% Ne„) supports 
the above contention that the substitution of 2 Arg residues 
by a negatively charged Glu leads to greater inversion of HN 
than a substitution with uncharged Gin residues. 

Effect of Arg Substitutions in the Context of a Truncated 
NH2'terminal Domain — In the case of two other type II 
membrane proteins, IgCAT (Lipp and Dobberstein, 1986a) 
and the asialoglycoprotein receptor (Schmid and Spiess, 
1988), truncations of the NH2-termina] cytoplasmic tail result 
in molecules which were cleaved at a cryptic site in the S/A, 
and these processed polypeptides were soluble within the ER 
lumen. Analysis of the orientation of a cytoplasmic tail dele- 
tion mutant of a related HN protein (from Newcastle disease 
virus) suggested that the mutant protein was of mixed orien- 
tation (Wilson et aL, 1990). In contrast, when an SV5 HN 
mutant was constructed and expressed which has the NH3- 
terminal domain truncated from 17 residues to the S-residue 
tail MVR, a single major glycosylated species was detected 
(Fig. 3, MVR lanes). The available data indicate that the 
mutant MVR protein is integrated in the lipid bilayer (Parks 
and Lamb, 1990). We do not have a simple explanation for 
the difference in result obtained firom two related HN cyto- 
plasmic tail mutants except that the experiments differed in 
that in vitro and in vivo membrane integration was examined. 
As the data obtained with the MVR mutant were not compli- 
cated by a competing signal peptidase-like cleavage, it pro- 
vided the opportunity to examine the effect of Arg substitu- 
tions within the context of the truncated MVR cytoplasmic 
tail. 

Two mutants were constructed in which the single Arg 
residue in the MVR tail was converted to a Glu (E) or Gin 
(Q) residue to produce mutant proteins with NH2-terminal 
domains of MVE and MVQ (Fig. 3). Expression of the MVQ 
mutant using the vaccinia virus system described above (MVQ 
lanes) produced a protein profile which matched that pro- 



Char 




esidues Direct Membrane Protein Topo 

- + 



m 



19105 



Pig. 2. Effect of a single NH2-ter- 
minal S/A-flanking Arg residue on 
HN topology. CV-1 cells were infected 
with vaccinia vTF7-3 and transfected 
with a DNA plasmid encoding HN mu- 
tant 26*. After radiolabeling with 
Tranl^'S] label, polypeptides were ira- 
munoprecipitated from cell extracts with 
HN antisera. Immune complexes were 
divided into two portions, incubated with 
(+) or without (— ) iV-glycan£ise, and the 
polypeptides were examined by SDS- 
PAGE. The NH2-terminal amino acid 
sequence of HN WT* is shown with the 
location of the 2 Arg residues converted 
to Gin to create the 26* mutant indicated 
by arrows. 



N 
N 



cyt 




exo 



96 N 



exo 



MVNATEDAPVRATCRVLFR 



S/A 



26 



25 



duced by the MVR protein. For both MVR and MVQ, trace 
amounts of a faster migrating species were also observed 
(lanes MVR— and MVQ—), and these species have a different 
electrophoretic mobility than deglycosylated MVR and MVQ 
(+ lanes). It is thought likely that these species represent 
de^adation products. In contrast, the MVE protein was syn- 
thesized as two major polypeptide species: one which migrated 
like the Ncyt form of MVR and a faster- migrating Nexo poly- 
peptide with a mobility matching that of the single protein 
resulting from N-glycanase treatment (MVE lanes). Alkali 
treatment of microsomal membranes from cells expressing 
the MVE mutant did not remove either of these two protein 
species from the membrane (data not shown). However, the 
formal analysis of showing transmembrane topology by using 
proteases to trim a segment of the cytoplasmic tail could not 
be done because the small size of the cytoplasmic tail pre- 
cludes a shift in electrophoretic mobility of the trimmed form 
on gels. Although these data do not provide formal proof that 
the NH2-terminal domain of the N,xo form of MVE has been 
fully translocated across the ER membrane, the strong asso- 
ciation of both MVE species with the membrane suggests that 
the lack of glycosylation of the Neio form was due to inversion 
into the type III orientation and was not due to defective 
integration into the membrane. Quantitation of the two forms 
of the MVE protein synthesized during a 1-h labeling period 
indicated that 50% of the MVE molecules adopted the in- 
verted N„o form. Mutant MVQ was not inverted in mem- 
branes as compared to when the same membrane-proximal 
mutation was made in the full 19 -residue WT* tail (mutant 
12*) (0 versus 18% in the New form). A possible explanation 
is that the loss of the S/A -flanking positive charge in the 
MVQ mutant is compensated for by the positive charge con- 
tributed by the adjacent NHa terminus of this truncated 



protein. As the MVE mutant contained the same membrane- 
proximal mutation as mutant 10* and yet led to different 
levels of protein- inversion (50 versus 23%), it lends further 
credence to the notion that other charge residues in the 
cytoplasmic domedn are important in establishing orientation. 

The NH2'terminal Ectodomain of the Type III NB Protein 
Can Function as a Type II Cytoplasmic Tail — A compilation 
of the amino acid sequences of known type II membrane 
proteins shows that the vast majority of these proteins ('-90%) 
have a residue with a positive charge (Arg or Lys) directly 
flanking the NH2-terminal cytoplasmic side of the S/A (for 
compilations see reviews by Paulson and Colley, 1989; Hart- 
mann et al,, 1989), and the importance of this positive charge 
for type II membrane protein topogenesis has been demon- 
strated experimentally (Parks and Lamb, 1991). For the small 
number of naturally existing proteins which are exceptions to 
this correlation and lack an NH2-terminal positively charged 
S/A-flanking residue, it is possible that the presence of a 
negative charge in this position may be compensated for by a 
long stretch of positive charges located more distal (NH2- 
terminal) to the S/A {e.g. neutral endopeptidase, Malfroy et 
o/., 1988); a suggestion made previously in formulating the 
positive inside rule for membrane protein topogenesis (von 
Heijne and Gavel, 1988) and supported by the experimental 
data shown in Fig. 1. In comparison to type II membrane 
proteins, there are relatively few known examples of the 
oppositely orientated type III proteins, but the vast majority 
have a negatively charged Glu or Asp residue directly flanking 
the NHa-terminal side of the S/A (Fig. 4). One of the excep- 
tions to this correlation is found with the influenza B virus 
NB protein (Williams and Lamb, 1986) which contains a 
single NH2-terminal positively charged residue flanking the 
S/A domain. Earlier work has shown that when a chimeric 



19106 



Char^^mesidues Direct Membrane Protein Topo 



1^ I* 1^ I 

I - + I - + I - + I 



was further examined biochemically. Both NBHH Ncyt and 
Nexo forms were resistant to alkali extraction (data not 
shown), and the NBHH Ncyt form (like HN WT*) was pro- 
tected from digestion by trypsin of microsomal membranes 
whereas the faster migrating NBHH Ne^ form was susceptible 
to protease digestion (Fig. SB). Taken together, these data 
suggest that the NB NHs-terminal ectodomain is capable of 
acting as a cytoplasmic tail when linked to the HN S/A 
domain. 



N 



cyl 



N -) 




% N 



exo 



MVAEDAPVRATCRVLPR 



M V R 



M V E 



M V Q 



50 



Fig. 3. The topological effect of charge alterations is en- 
hanced in the context of a truncated HN NHa-terminal do- 
main. CV-1 cells infected with vaccinia vims vTF7-3 were trans- 
fected with plasmids encoding HN mutants MVR, MVE, or MVQ. 
Polypeptides were radiolabeled, immunoprecipitated with HN anti- 
sera, digested with (4-) or without (— ) A'^-glycanase, and analyzed by 
SDS-PAGE as described for Fig. 2. The NH2-terminal sequence of 
the mutants is listed below that of HN, with the position of the 
altered Arg residue indicated by a vertical arrow, 

protein MgHH. which was composed of the NH2-terminal 
ectodomain of the type III M2 protein linked to the HN S/A 
and COOH-terminal domains, was expressed the chimera 
integrated into membranes in two opposing orientations, but 
with the N«»o orientation predominating (Parks and Lamb, 
1991 and see Fig. 5). As the NH2-terminal domain of NB has 
a S/A domain-proximal positive charge but is functionally a 
type III ectodomain, it was of interest to determine which 
would be the predominating factor when this portion of the 
NB protein was linked to the HN S/A and COOH-terminal 
domains in a chimeric protein, NBHH. 

The NBHH chimeric protein was expressed in CV-1 cells 
using the vaccinia T7 system and was found as two predomi- 
nant species (Fig. 5A, NBHH lanes): 70% as an Ncyt species 
with a mobility similar to that of the HN WT* protein ( WT* 
lanes), and 30% as a faster migrating N«o form. The difference 
in electrophoretic mobility between these two forms of NBHH 
was due to glycosylation (the Neio form has two and the Ncyt 
form has four glycosylation sites) as only a single NBHH 
polypeptide species with identical mobility to deglycosyiated 
WT* was detected after N-glycanase treatment (NBHH, -h 
lanes). The membrane orientation of the two NBHH species 



DISCUSSION 

All nascent polypeptide chains use a common machinery 
for the targeting to the ER membrane (Walter and Lingappa, 
1986), and yet by comparison very little amino acid identity 
is found among signal sequences. This is illustrated by a 
comparative sequence analysis (von Heijne, 1985) as well as 
experimentally, where it has been shown that seemingly ran- 
dom peptide sequences can function in targeting to the secre- 
tory pathway (Kaiser et al, 1987; Paterson and Lamb, 1990). 
Likewise, the mechanism which follows this targeting to the 
membrane and leads to exclusively one orientation in the lipid 
bilayer must be precise and at the same time degenerate 
topogenic signals must be recognized, as there is little amino 
acid sequence identity among a variety of membrane proteins 
which have the same topology. Recent data indicate that 
charged residues are an important part of the signal for 
determining membrane protein topology (Beltzer et at., 1991; 
Haeuptle et a/., 1989; Parks and Lamb, 1991). 

The data obtained from a systematic analysis of the role of 
each of the HN NH2-terminal Arg residues in determining 
the topology of the protein indicates that several conclusions 
can be drawn which address key features of membrane protein 
topology (reviewed in Boyd and Beckwith, 1990; High and 
Dobberstein, 1992) which although speculated on previously 
had not been examined by experiment. First, each of the 3 
HN Arg residues contributes to the signal directing the Ncyt 
topology, with substitutions in the proximal S/A-flanking 
position leading to more inversion into the Ne« form than 
substitutions of the distal positions. It was shown previously 
that the S/A-flanking Arg residue is very important in estab- 
lishing orientation. However, the charge alterations of this 
residue did not lead to complete inversion of HN in the 
membrane (Parks and Lamb, 1991). Thus, the observation 
that the inversion of HN was only partial can be explained 
by the presence of the other two NH2-terminal Arg residues, 
and HN can be nearly completely inverted to the N,xo form 
(80%) by replacing all 3 Arg residues with Glu. The finding 
that the NB ectodomain can direct the Ncyt topology to 
approximately the same extent as the HN 26* mutant (which 
contains only a single S/A-flanking Arg) lends further support 
to the proposal that the exact sequence of a cytoplasmic tail 
is less critical for the generation of the type II topology than 
the position and number of positive charges (Parks and Lamb, 
1991). Second, the relative importance of a given positively 
charged residue in contributing to the signal for topogenesis 
may depend on the length of the NH2-terminal tail, since HN 
is inverted in the membrane to a gfreater extent when a charge 
alteration is introduced into a truncated tail than when it is 
introduced in the context of the full-length NH2-terminal 
domain. Likewise, in the case of the asialogiycoprotein recep- 
tor (Beltzer et a/., 1991) 2 Arg to Asp substitutions lead to 
greater inversion in the membrane when introduced in the 
context of an NH2-terminal tail which has been truncated 
from 40 (3% Ne,o) to 11 residues (65% N„o). Thus, the 
orientation signal may depend on the position and charge 
density of the positive charges, and these two factors could 



Chargi 




I 
1 

CO 

o 

1 



PROTEIN 



sidues Direct Membrane Protein TopoUl^^^ 

NHj TM COOH 



19107 



R. cytochrone FA50e 






















H 


E 




R 


G 


H 


P 


K 


S 


R 


G 


N 


F 


F 


P 


1 


IBV 3C 




H 


H 


N 


L 


L 


N 


K 


S 


L 


E 


E 




R 


A 


L 


Q 


A 


F 


V Q A A 


D 


A 


2 


R. MlnK 


R 


R 


S 


Q 


L 


R 


D 


D 


S 


K 


L 


E 


v:i^ . Ji 


R 


S 


K 


K 


L 


E 


n 




rf 




F 


F 


3 


IBV El protein 


L 


D 


F 


E 


Q 


S 


V 


Q 


L 


F 


K 


E 


y %' ^ 


R 


S 


K 


V 


I 


Y 


L 


Li 




n 


I 


V 


4 


M. LMu-CSF 


P 


A 


P 


A 


L 


P 


L 


E 


D 


Q 


N 


E 




R 


D 


T 


H 


R 


L 


T 

1 




1 


Li 


M 


c 


5 


H. red/ green opsin 


Y 


T 


N 


S 


N 


S 


T 


R 


G 


P 


F 


E 




K 


F 


K 


K 


L 


R 


u 
n 


p 

It 


1 


n 


V 


I 


6 


H. ^-adrenergic rec. 


A 


P 


D 


H 


D 


V 


T 


Q 


Q 


R 


D 


E 




K 


F 


E 


R 


L 


Q 


T 

1 


V 


T 
X 


u 

n 


Y 


F 


7 


H. adrenergic rec. 


A 


S 


L 


L 


P 


P 


A 


s 


E 


S 


P 


E 




K 


T 


P 


R 


L 


Q 


T 


1« 


T 


u 
n 


L 


F 


8 


B. opsin 


S 


P 


F 


E 


A 


F Q 


Y 


Y 


L 


A 


E 




H 


K 


K 


L 


R 


T 


P 


id 


u 
n 


V 
X 


I 


L 


9 


Y. Sec63p 


M 


P 


T 


N 


Y 


E 


Y 


D 


E 


A 


S 


E 




E 


D 


C 


N 


S 


G 


V 


e 
S> 




IS 


F 


N 


10 


R. cytochrome F450 red. 


V 


A 


E 


E 


V 


S 


L 


F 


S 


T 


T 


D 




R 


K 


K 


K 


E 


E 


I 


P 


E 


F 


S 


K 


11 


H. glycophorin C 


G 


R 


M 


E 


T 


S 


T 


P 


T 


X 


M 


D 


'^.r . -J 
'i - VI 


R 


Y 


H 


Y 


R 


H 


K 


G 


T 


Y 


H 


T 


12 


B. substance K rec. 


V 


M 


T 


D 


I 


N 


I 


S 


S 


G 


L 


D 


5=; 

. ' ■, 
% *y 


H 


Q 


R 


H 


R 


T 


V 


T 


N 


Y 


F 


I 


13 


UR2 sarcoaa virus ros 


T 


P 


K 


T 


V 


D 


T 


V 


T 


S 


P 


D 


'. • '\ 


H 


Q 


R 


W 


K 


S 


R 


K 


P 


A 


S 


T 


16 


rotavirus NS28 


L 


H 


N 


S 


T 


L 


H 


T 


I 


L 


E 


D 


■"t ■' ;V. y 


H 


K 


A 


s 


I 


P 


T 


M 


K 


I 


A 


L 


15 


R. serotonin rec. 


S 


S 


D 


G 


G 


R 


L 


F 


Q 


F 


P 


D 


1 ■ ■ 1^ : 


E 


K 


K 


L 


H 


N 


A 


T 


N 


Y 


F 


L 


16 


Influenza A virus H> 


N 


E 


U 


G 


C 


R 


C 


N 


D 


S 


S 


D 




D 


R 


L 


F 


F 


K 


C 


I 


Y 


R 


F 


F 


17 



H. blue opsin 
Influenza B virus NB 
H. alpha,- adrenergic rec 
AEV v-erb-B 



PNYHIAPRUVYH 
NCTNINPITHIR 
VNGTEAPGGGAR 
GPGLEGCPNGSK 



RYKKLRQFLNYI 6 

KIFIMKNNCTNN 18 

RALKAPQNLFLV 19 

RRRHIVRKRTLR 20 



Fig. 4. Comparison of the amino acid sequence of typ^ UJ proteins. The 12 amino acidfi flanking the amino- {NH2) and carboxyl* 
iCOOH) aides of the transmembrane domain {TM) of known type III (N.«») proteins are listed in one letter code. The borders of the TM are 
operationally defmed as the first charged residue on either side of the hydrophobic domain. In some instances {e.g. IBV El protein), the first 
transmembrane domain of a multispanning membrane protein has been shown to be an uncleaved S/A with the Ng^o topology, and the 
relevant sequence of these proteins is included for completeness. This list may not be comprehensive, but includes those proteins for which 
there is reasonable biochemical evidence for type III topology. IBV, infectious bronchitis virus; LMu-CSF, long form of the multUineage 
colony-stimulating factor; rec. receptor, red., reductase; -R., rat; M., murine; H,, human; B,» bovine; Y., yeast; AEV^ avian erythroblastosis 
virus; UR2, avian sarcoma virus UR2. The references used are: 1) Nelson and Strobel, 1988; 2) Liu and Inglis, 1991; 3) Takumi et ai, 1988; 
4) Machamer and Rose, 1987; 5) Haeuptle et at, 1989; 6) Nathans et a/., 1986; 7) Schofield et ai, 1987; 8) Frielle et al., 1987; 9) Nathans and 
Hogness, 1983; 10) Feldheim et oL, 1992; 11) Porter and Kasper, 1985; 12) High and Tanner. 1987; 13) Masu et aL, 1987; 14) Neckameyer et 
cU., 1985; 15) Bergmann et a/., 1989; 16) JuUus et ai., 1988; 17) Lamb et ai., 1985; 18) WUliams and Lamb. 1986; 19) KobUka et ai., 1988; 20) 
Schatzman et oi., 1986. 



explain those few examples of type II proteins which have a 
negatively charged residue flanking the NH^-terminal side of 
the S/A (e.g. neutral endopeptidase, Malfroy et oL, 1988). 
Third, the substitution of Arg by a negatively charged Glu 
was a more potent inducer of inversion of HN orientation 
than was a replacement with an uncharged Gin {i,e. -'8-14% 
more in the N.xo form in the double Arg mutants). These data 
indicate that the inversion of HN orientation by these Arg 
substitutions was not due simply to lack of a positive charge 
and suggest that negative charges may act to promote trans- 
location across the ER membrauie. These observations are in 
contrast to the finding made for bacteria, where the orienta- 
tion of eui inner membrane protein can be reversed by the 
addition or removal of a single ptositively charged residue, but 
negative charges do not effect topology unless they are present 
in very high numbers (Nilsson and von Heijne, 1990; Anders- 
son et ai., 1992). 

A comparative analysis of the amino acids which comprise 
cleavable signal sequences indicates that these signals are 



composed of three domains: a positively- charged NH2-termi- 
nal region, a central short stretch of hydrophobic residues, 
and a COOH -terminal region containing small polar residues 
which defines the site of cleavage by signal peptidase (von 
Heijne, 1984, 1985). The uncleaved S/A of a typical type 11 
protein is structurally very similar to a type I signal sequence, 
and it has been shown experimentally that, except for the 
presence of a site for cleavage by signal peptidase in the type 
I proteins, these two signal sequences are functionally equiv- 
alent. It has been shown that a type II S/A can be converted 
to a cleavable signal sequence by NH^-terminal alterations 
which expose a cryptic cleavage site (Lipp and Dobberstein, 
1986a; Schmid and Spiess, 1983), and conversely it has been 
shown that a type I cleavable signal sequence can function as 
an uncleaved S/A when modified by extending the NH:- 
terminal flanking domain and blocking the cleavage site 
(Shaw et oL^ 1988). Based on these structural and functional 
similarities, it has been proposed that the type I and 11 
proteins share a common mechanism for membrane integra^ 



o 

o 

o 
a> 

Q. 
Q. 

3 



cr 
o 

b 

(B 

cr 
*< 

o 

o 
a 

o 
cr 

Q 
M 

o 
o 



19108 



m 



Cfiarg^mesidues Direct Membrane Protein Topo 



Fig. 5. Expression and biochemi- 
cal characterization of the NBHH 
hybrid protein. A, expression of 
NBHH. Vaccinia virus vTF7-3- infected 
cells were transfected with plasmid DN A 
encoding HN WT*, NBHH. or MgHH. 
Proteins were radiolabeled, immunopre- 
cipitated with HN antisera. incubated 
with (+) or without (— ) /V-glycanase, and 
analyzed by SDS-PAGE as described in 
the legend to Fig. 2. The positions of the 
Ncyt and N.^o polypeptides are indicated. 
B, proteinase treatment of microsomal 
membranes from cells expressing WT* 
and NBHH. Vaccinia virus vTF7-3-in- 
fected cells were transfected with plas- 
mids encoding HN WT* {lanes 1 and 2) 
or NBHH {lanes 3 and 4) and were ra- 
diolabeled with Tran[**S]label. Crude 
microsomal membranes were prepared 
and treated with buffer {lanes 1 and 3) 
or with trypsin {lanes 2 and ^) as de- 
scribed previously (Parks et oL, 1989). 
Following centrifugation, samples were 
immunoprecipitated with HN antisera 
and analyzed by SDS-PAGE. The NHs- 
terminal sequence of HN WT* and of 
the chimeric NBHH and MgHH pro- 
teins is shown below, with a cross- 
hatched box and horizontal lines denoting 
the HN S/A and COOH-terminal ecto- 
domain. respectively. The location of the 
consensus sites for N-linked glycosyla- 
tion are highlighted by asterisks. 



A. 



WT* NBHH MgHH 



B 



WT 
1 2 



NBHH 
3 4 



N -> 



N 



exo 



Hutanfc 



NBHH 



HgHH 



% N 



MVNATBDAPVRATCRVLFR 
MNHATFNCTNINPITHIR 
MSNLTBVBTPI RNBWGCRCNDSSD 



S/A 



VNXXVVX 



30 
65 



tion and topogenesis (von Heijne and Blomberg, 1979; Inouye 
and Halegona, 1980; Engelman and Steitz, 1981; Shaw et al., 
1988), with the nascent polypeptide being presented to the 
ER membrane as a loop structure formed by holding both 
NH2- and COOH-terminal sides of the signal sequence on the 
cytoplasmic side of the lipid bilayer with the NHa-terminal 
retention signal composed at least in part of positively charged 
residues (reviewed in High and Dobberstein, 1992). 

In contrast to the establishment of type II protein orienta- 
tion, the rules determining type III protein orientation remain 
enigmatic. Type III proteins depend on SRP for membrane 
targeting and integration (Hull et aL, 1988) and may be 
presented initially to the membrane as a loop structure (for a 
schematic diagram, see review by High and Dobberstein, 
1992), but lacking the cytoplasmic retention signal the NH2 
terminus of these proteins would be translocated across the 
bilayer. As initially proposed to explain the topogenesis of the 
first Ne»o transmembrane of opsin (Audigier et al.^ 1987), the 
NH2-terminal region of all nascent membrane proteins (type 
I-III) may bind to an unrecognized factor to form the common 
loop structure, but for type III proteins this binding may be 
more readily dissociated leading to "flipping" of the NH2 
terminus across the ER membrane. The ability to vary the 
inversion of HN into the N«xo form by NH2- terminal charge 
alterations may reflect the degree of dissociation of the mu- 
tant NH2 terminus from this putative binding factor, with 
positively charged residues being held more tightly than neg- 
atively charged residues. In the case of Escherichia coli, the 
acidic SecA protein appears to interact directly with positive 
charges in the signal sequence of nascent type I proteins 
during translocation across the cytoplasmic membrane (Akita 
et aL, 1990), Although a protein analogous to secA has not 
been identified to date in eukaryotic cells, recent cross-linking 
and reconstitution studies have led to the identification of 
several ER membrane proteins which may be directly involved 



in forming an aqueous pore across membranes (reviewed in 
Rapoport, 1992). Thus, these proteins are candidates for 
interacting with the NH2-terminal positive charges of a nas- 
cent polypeptide chain. Alternatively, the type III proteins 
may employ a distinct topogenic mechanism, whereby the 
NH2 terminus is not bound to form the transient loop struc- 
ture but is presented to the ER membrane in a "head-on" 
configuration. 

The experimental data described here indicate that it is 
possible to convert a type II protein into the Ne,o topology by 
NH2-terminal charge alterations, and thus these data address 
indirectly the nature of the topogenic signals of naturally 
occurring type III proteins. Although experimentally a type 
III protein can be converted to a type II protein, by complete 
exchanges of S/A-flanking domains (Parks and Lamb, 1991), 
a direct systematic testing of the role of individual proximal 
and distal charges in generating the type III topology has yet 
to be performed. In the MgHH chimera, the type III Mg 
ectodomain which lacks a S/A-flanking-positively charged 
residue directed 65% of the molecule in the type III orienta- 
tion, whereas in the NBHH chimera the type III NB ectodo- 
main, which contains a positively charged residue flanking 
the S/A domain, directed 70% of the molecules in the opposing 
HN type II orientation. Thus, the signal for establishing type 
III topology may be complex and consist of the NH2-terminal 
ectodomain in conjunction with the S/A domain, and the 
artificial dividing of two parts of the signal in the chimera 
may explain the difference in the ability of the M2 and NB 
type III ectodomains to function in directing the N«xo topology 
when linked to the HN S/A (MgHH and NBHH). This may 
also explain the observation that a chimeric protein can adopt 
dual orientations, a problem not found with naturally existing 
proteins. In the case of the type III cytochrome P-450 protein, 
it has been proposed that membrane topology is determined 
by a balance between the NH2-terrainal charged residues and 




3 
1 



Chargt^fesidues Direct Membrane Protein Topo 



M 



19109 



the length of the hydrophobic signal (Sakaguchi et cd., 1992), 
with proteins in the N«u> topology requiring a longer hydro- 
phobic stretch and fewer positive charges. Therefore, for type 
ni proteins overlapping signals contributed by both the S/A 
and NHa-terminal domains may act together to assure the 
precise steps in establishing membrane orientation. 

Acknowledgments — We thank Margaret Shaughnessy for excellent 
technical assistance and Zhi-hai Ma and Oscar Vallea for constructing 
some of the HN mutants as D99 projects at Northwestern University. 

REFERBNC^ 

Adama, G. A., and Rose. J. K. <1986> MoL CeO. BioL B> 1442-1448 

Akita, M., Sasaks* S.» Matsuyama, S.-I., and Mizuahima, S. (1990) J, BioL 

Chem. 265, 8164^169 
Andfiraaon, H., Bakker, E., and von Heyne, G. (1992) J, BioL Chem, Z67» 1491- 

1496 

Axidigier, Y., Priedlander, M., and Blobel. G. (1987) Proc. NatL Acad. ScL 

U,S,A, 84p 6783-5787 
Beltzer, J. P., Fiedler, K., Fuhrer, Geffen, L. Handschin, C.> Weasels, H. P., 

and Spiess, M. (1991) J. BioL Chem, 266, 973-978 
Bergmann, C. Maass, D., Poiuchyns^, M. S., AtkinsozL, P. H., and Bellamy, 

A. R. (1989) EMBOJ. 5, 1543-1650 
BlobeL G. (1980) Proc. NatL AcatL ScL U. S. A 77, 1496-1500 
Boyd, D., and Beckwith, J. (1990) Cell 62, 1031-1033 
EngeUoan, D. M.. and SteiU, T. A. (1981) CeU 23, 411-422 
Erickson, A, H., and Blobel, G. (1979) J, BioL Chem. 254, 11771-11774 
Feldbeim, D., Rothblatt, J., and Schekman, R. (1992) AfoJL CelL BioL 12, 3288- 

3296 

FrieUe, T.. CoUina, S., Daniel, K. W.. Caron. M. G., Lefkowita, R J., and 

KobUka, B. K, (1987) Proc NatL Acad. ScL U 5. A. 84, 7920-7924 
Fuerst. T. R., NUes, E. G.. Studier, F. W,, and Moaa, B. (1986) Proc. NatL Acad. 

Set V. 8. A. 85, 8122--8126 
Haeuptle, M.-T., Flint, N., Gough, N. M., and Dobberstein. B. (1989) J. CeU 

BioL 108, 1227-1236 
Hartmann, E., Rapoport, T. A., and Lodish, H. F. (1989) Proc. NatL Acad. Sci 

U. S. A. 86, 6786-«790 
Hiebert, S. Paterson, R. G., and Lamb, R A. (1986) J. ViroL 54, 1-6 
High, S., and Dobberstein, B. (1992) Curr. Opin. CeU BioL 4, 681^686 
mOk, S., and Tanner, M. J. A. (1987) Biochem. J. 243, 277-280 
Hull, J. D., Gilmore. and Lamb, R A. (1988) J. CeU BioL 106, 1489-1498 
Inouye. M., and Halegona, S. (1980) CBC Crit Rev. Biochem. 7, 339-^71 
Julius, D.. MacDermott, A. B., Axel, R, JeaaeU, T. M, (1988) Science 241, 

668-664 

Kaiser, C. A., Preuas. D., Grisafi, P., and Botatein, D. (1987) Science 235, 312- 
317 

Kobilka, B. K., Matsui. H., Kobilka, T. S., YanK-feng, T. U, Francke, U., 
Caron, M. G., Lefkowitx, R J., and Regan, J. W. (1988) Science 238, 660- 
666 



Lamb, R A,, and Choppin. P. W. (1976) Virolasy 74, 604-519 

Lamb, R A., and Lai, (1982) Virolm 129. 237-256 

Lamb, R A., Z^>edee, S. U. and Rtchardaon. C. D. (1985) CeU 40, 627-633 

Upp, J., and Dobberatein, B. a986a) CeU 46, 1103-1112 

Lmp, and Dobbeietein, B. (1986b) J. CeU BioL 102, 216&-2176 

Uu. fa. X.. and Inglis, S, C. (1991) Virolo^ 185, 911-917 

Mechamer, 0. E., and Rose, J. K. (1987) 7. CeU BioL 105, 1205-1214 

Mal^oy, B., Kuang, W.-J., Seeburg, P. H., Mason, A. J., and Schofield, P. R 

(1988) FBBS Lett 229, 206-210 
Masu, Y., Nakayama, K., Tamaki, H., Harada, Y., Kuno, M., and Nakanishi, 

S. (1987) Nature 329, 836-838 
Nathans, J., and Hogness, D. S. (1983) CeU 34, 807-^14 
Nathans, J., Thomas, D.. and Hogness, D. S. (1986) Science 232, 193-202 
Neckameyer, W. S.. and Waxig. L.-H. (1986) J. VinO. 53, 879-684 
Nelson, D. R, and Strobel, W. (1988) Proc. NatL Acad. Sci U. S. A. 263, 

6038-6060 

NUsson, I., and von Heiine, G. (1990) CeU 62, 1135-1141 
Ng, D. T. W., Hiebeit, STw., and Lamb, R A. (1990) MoL CeU. BioL lO, 1989- 
2001 

Nothwehr, S. F., and Gordon, J. L (1989) J. BioL Chem, 264, 3979-3987 
Parks, G. D., Hull, J. D.. and Lamb, R A. (1989) J. CeU BioL 109, 2023-2032 
Parks, G. D., and Lamb, R A. (1990) */. ViroL 64, 3605-3616 
Parks, G. D., and Lamb, R A. (1991) CeU 64. Ill-IBI 
Paterson, R G., and Lamb, R A. (1990) J. CeU BioL 110, 99&-1011 
Paulson, J. C., and CoUey, K. J. (1989) J, BioL Chem. 264, 17615-17618 
Porter, T. D., and Kasper, C. B. (1985) Proc NatL Acad. ScL U. S. A. 82, 973- 
977 

Rmport. T. A. (1992) Science 252, 931-936 

Sakaguchi, M., Tomwoahi. R, Kxiroiwa, T., Mibara, K„ and Omura, T. (1992) 

Proc. NatL Acad. ScL U. S. A. 89, 16-19 
Sanger, F., Nicklin. S., and Coulson, A. R (1977) Proc. NatL Acad. ScL V. S. A. 

74, 5463-6467 

Schatzman. R C, Evan, G. I., Privalsky, M. L., and Bishop. J. M. (1986) MoL 

CeU. BioL 6, 1329-1333 
SchmicL S. R, and Spiess, M. (1988) J. BioL Chem. 263, 16886-16891 
Schneider, C, Owen, M. J., BanviUe. D., and WUliams, G. W. (1984) Nature 

311,675-678 

Schofield, P. R, Rhee, L. M., and Peralta, E. G. (1987) Nucleic Acid» Rea. 15, 
3636 

Shaw, M. W., Lamb, R A., Erickson, B. W., Briedia. D. J., and Choppin, P. W. 

(1982) Proc. NatL Acad. ScL U. S. A. 79, 6817-6821 
Shaw. A. S., Rottier, P. J. M.. and Rose, J. K. (1988) Proc. NatL Acad. ScL 

U, S, A. 85, 7592-7596 
Spiesa, M., and Lodish, H. F. (1986) CeU 44, 177-186 
Takumi, T., Ohkubo, H.. and Nakanishi, S. (1988) Science 242, 1042-1045 
von Henne, G. (1984) EMBOJ. 3, 2315-2318 
von Hehne. G. (1986) J. MoL BioL 184, 99-105 
von Henne, G. (1986) EMBO J, 5, 3021-^27 

von Henne. G., and Blomberg, C. (1979) Eur. J. Biochem. 97, 176-181 
von Henne, G.. and Gavel. Y. (1988) Eur. J. Biochem. 174, 6711-6718 
Waker. P., and Lingappa, V. R (1986) Annu. Rev. CeU BioL 2, 499-^16 
WUkaistsM. A.7and Lamb, R A. (1986) MoL CeU. BioL 6, 4317-4328 
Wilson. C, GUmore R. and Morrison, T. (1990) MoL CeU BioL 10, 449-457 
Zerial, M.. Huylebroeck. D.. and Garoff. H. (1987) CeU 48, 147-155 



ro 
o 
o 





Exhibit 27 



Cell, VW. 64, 777-787, February 22. 1991. Copyright © 1991 by Cell Press 



Topology of Eukaryotic Type II Membrane Proteins: 
Importance of N-Termlnal Positively Charged 
Residues Flanking the Hydrophobic Domain 



Griffith D. Parks and Robert A. Lamb 

Department of Biochemistry, Moiecuiar Biology 
and Cell Biology 

and Howard Hughes Medical Institute 
Northwestern University 
Evanston, Illinois 60208-3500 

Summary 

We have tested the role of different charged residues 
flanking the sides of the signal/anchor (S/A) domain of 
a eukaryotic type 11 (NcytC^to) Integral membrane pro- 
tein In determining Its topology. The removal of posi- 
tively charged residues on the N-termlnal side of the 
S/A yields proteins with an inverted topology, while 
the addition of positively charged residues to only the 
Otermtnal side has very little effect on orientation. Ex- 
pression of chimeric proteins composed of domains 
from a type II protein (HN) and the oppositely oriented 
membrane protein Ma Indicates that the HN N-termi- 
nal domain Is sufficient to confer a type It topology 
and that the M2 N-termlnal ectodomaln can direct a 
type II topology when modified by adding positively 
charged residues. These data suggest that eukaryotic 
membrane protein topology Is governed by the pres- 
ence or absence of an N-termlnal signal for retention 
In the cytoplasm that is composed In part of positive 
charges. 

Introduction 

The signals that direct membrane protein topology are 
precise, as it appears that almost all naturally occurring 
membrane proteins adopt only one final orientation, which 
is determined by the amino acid sequence of the polypep- 
tide chain (Bk>bel, 1980). Integral membrane proteins that 
span the lipid bilayer a single time can be classified as 
type I, II. or III (nomenclature of von Heijne. 1988), and this 
is based on the nature of their hydrophobic domains and 
their orientation In membranes. Type I proteins contain an 
N-termlnal cleavable signal sequence that targets the na- 
scent polypeptide to the endoplasmic reticulum (ER) 
membrane (reviewed in Walter and Lingappa. 1986). The 
final NexoCcyt topology of type t proteins is determined by 
cleavage in the ER lumen of the N-terminal signal se- 
quence by signal peptidase (Evans et al., 1986), and their 
translocation across the membrane is halted by a C-ter- 
minal hydrophobic stop-transfer region that anchors the 
polypeptide in the lipid bilayer. Type I proteins constitute 
the major class of integral membrane proteins that span 
the membrane once. The type II proteins do not contain 
a cleavable signal sequence, but Instead have a long 
stretch of hydrophobic residues, the signal/anchor do- 
main (S/A), which serves the dual function of targeting and 
anchoring the polypeptide in the ER membrane with an 
NcytCsxo topology. Examples of type II proteins include 
the transferrin receptor (Schneider et al., 1984). HLA- 



associated Invariant chain (Strubin et al., 1984), asialo- 
glycoproteln receptor (Spiess and Lodish, 1985), and the 
paramyxovirus hemagglutinlrvneuraminldase (HN) and 
SH proteins (Hiebert et al., 1985a, 198Sb). 

The type III proteins contain an internal undeaved S/A 
but adopt the NexoCcyt orientation: the known examples 
constitute a small group including gp74 v-e/£>B of avian 
erythroblastosis virus (Schatzman et al.. 1986), eryth- 
rocyte sialoglycoprotein p (High and Tanner, 1987), cy- 
tochrome P450 (Sato et at., 1990), the influenza A virus 
M2 protein, and the influenza B virus NB protein (Lamb et 
al., 1985; Williams and Lamb. 1986). Recent experimental 
evidence has provided support for the earlier speculation 
(von Heijne and Blomberg, 1979; Inouye and Halegoua, 
1980; Engelman and Steitz. 1981) that the nascent poly- 
peptide chain of type I and II proteins is inserted into the 
ER membrane by a common mechanism involving a hair- 
pin loop structure, and that the final topology of these pro- 
teins is determined by the presence or absence, in type 
I and type II proteins, respectively, of a site in the N-ter- 
minal hydrophobic domain that can be cleaved by signal 
peptidase (LIpp and Dobberstein, 1986a: Shaw et al., 

1988) . Although the type HI proteins, such as the influenza 
virus M2 protein, appear to share the common SRP-me- 
diated ER targeting mechanism found with type I and II 
proteins (Lipp and Dobberstein, 1986b; Hull et al., 1988), 
the detailed steps of their membrane insertion have not 
been characterized. 

We are interested In determining the signals that direct 
the opposing membrane topologies of eukaryotic type II 
and type 111 integral membrane proteins and have used the 
HN and M2 proteins as models. That the hydrophobic na- 
ture of the residues composing an S/A appear to be the 
only structural requirement for this domain to function in 
targeting and anchoring a polypeptide (Zerial et al., 1987) 
and that it has been shown that an S/A domain can be in- 
verted in membranes without loss of function (Parks et al., 

1989) suggest that sequences outside of the S/A of the 
type II and III proteins direct membrane orientation. Analy- 
sis of the sequences of known membrane proteins led to 
the proposal of the "positive inside rule" (von Heijne, 
1986a; von Heijne and Gavel, 1988), in which membrane 
proteins orientate themselves with the most positively 
charged end in the cytoplasm. However, based on a re- 
cent comparison of the sequences of eukaryotic type II 
and HI membrane proteins, a strong correlation between 
the sum of the charges flanking the S/A of a protein and 
its membrane topology has been identified (Hartmann et 
al.. 1989). It was proposed that the net charge of the 15 
residues flanking the two sides of the S/A directs the orien- 
tation of a nascent polypeptide and that the domain with 
the more positive overall charge is retained in the cy- 
toplasm. Thus, this icharge difference" hypothesis pre- 
dicts that it is not the absolute number of positive or nega- 
tive charges flanking the S/A but the sum of the Hanking 
charges that is important for directing the topology of the 
protein (Hartmann et al., 1989). 



Cell 
778 



We report here experiments designed to examine the 
role of charged residues in determining topology. An HN 
cDNA clone was systematically altered by site-specific 
mutagenesis to introduce negatively charged residues 
into the N-termincU flanlcing region and positively charged 
residues into the C-terminal side. Analysis of the topology 
of the altered proteins expressed in CV-1 ceils emphasizes 
the importance of N-terminal positive charges in the es- 
tablishment of the HN topology. From analysis of the 
orientation of various chimeric molecules constructed 
from domains of HN and M2 we suggest that the estab- 
lishment of the type II NcytCexo topology is dependent on 
the presence of an N-terminal cytoplasmic retention sig- 
nal, which is in part composed of positively charged 
residues, and that the opposing HN and M2 orientations 
are governed by the presence or absence of this N-ter- 
minal signal in these two polypeptides. 

Results 

Construction of Charge-Altered HN Mutants 

To determine if a charge difference between the N-ter- 
minal and C-terminai side of the S/A domain is a factor in 
establishing type II membrane topology, the cDNA clone 
of the model type II protein HN was systematically mutated 
by oligonucleotide-dtrected mutagenesis to generate a se- 
ries of charge-altered HN proteins (Figure 1A). In this se- 
ries of mutants, HN residues flanking both sides of the S/A 
domain were changed separately or in combination such 
that the sum of the charges within the N-terminal 15 res- 
idues was progressively more negative than that of the 15 
C-terminal flanking residues. The charge difference rules 
(Hartmann et al., 1989) predict that each of these HN mu- 
tants should adopt an inverted U^Ccyt topology and. be- 
cause the only sites for N-ltnked glycosylation are in the 
C-terminal ectodomain (Hiebert et at., 1985a; Ng et al., 
1990), these inverted molecules should be readily distin- 
guishable from those proteins with the normal HN orienta- 
tion by their lack of glycosylation. 

Expression of Charge-Altered HN Proteins 

To obtain a high level of expression of the mutant HN pro- 
teins, the vaccinia virus system of Fuerst et al. (1986) was 
employed. CV-1 cells infected with vaccinia virus vTF7-3, 
which expresses the bacteriophage T7 RNA polymerase, 
were transfected with plasm id DNAs encoding the mutant 
proteins under control of the T7 RNA polymerase pro- 
moter After radtolat>eting the cells for 1 hr with TranpS] 
label, proteins were immunoprecipitated from cell extracts 
with HN antisera and examined by SDS-potyacrylamide 
gel electrophoresis (SOS-PAGE). Using this expression 
system, wild-type (WT) HN was synthesized as a single 
polypeptide of - 68,000 (Figure IB, lane WT). 

Expression of the HN mutants produced a protein pro- 
file that was significantly different from that of WT HN. The 
charge-altered mutants were synthesized to varying 
degrees as a mixture of two major polypeptides: a species 
with an electrophoretic mobility similar to that of WT HN, 
designated N^yt, and a faster-migrating form (Mr = 50,000, 
Figure 1B. lanes 1-9). designated Naxo- Mmor pplypep- 



tide species migrating faster than the species are 
thought to be degradation products of WT HN as de- 
scribed previously (Ng et al.. 1989). After treatment of the 
proteins with peptide: N-glycosidase F (N-gtycanase), 
each of the mutants was detected as a single polypeptide 
with an electrophoretic mobility similar to that of the Nexo 
protein (not shown), and this suggests that the Ncyi and 
Nexo forms are a single polypeptide species that differ 
from each other by N-linked glycosylation. Further bio- 
chemical evidence that the Ncyt and Nqxo forms of altered 
HN molecules are integral membrane proteins with op- 
posing orientations is presented below. 

Pulse-labeling followed by chase protocols indicated 
that within a 1 hr period all the forms of the mutant HN 
were stable (data not shown), and thus quantitation of the 
amounts of the species that accumulate is a reasonable 
assay for determining the amounts in each orientation. 
Densitometric scanning of autoradiograms from several 
experiments indicated that the fraction of HN mutants 1 
and 2 found in the Ngxo form was 12% and 30%, respec- 
tively (Figure 1A, % Nexo). which suggests that the in- 
troduction of negatively charged residues to the N-terminal 
side of the S/A has an important effect on membrane 
orientation. In contrast, only 5%-6% of the total HN pro- 
tein was synthesized as the Noxo species in the case of 
mutants 3-5. which encode a normal N-terminal domain 
but are modified by the addition of positively charged 
residues to the C-terminal side of the S/A. Combinations 
of N- and C-terminal substitutions (mutants 6-9) had the 
largest effect on HN orientation, as an increasing fraction 
of the total HN protein was synthesized as the Ngxo spe- 
cies when N-terminally altered mutants 1 and 2 were fur- 
ther modified by the addition of positive charges to the 
C-terminal side of the S/A (Figure IB. lanes 6-9). A minor 
species of unknown origin that migrates between the N^yt 
and Nexo forms was immunoprecipitated from cells ex- 
pressing the most highly charge-altered proteins (lanes 
7-9), but its presence does not affect the interpretation of 
the data. The inversion to the Ngxo form reached a maxi- 
mum value of 75% with mutant 9, which encoded N- and 
C-terminal net charges of -2 and +4, respectively. 

These data suggest that the normal HN orientation can 
be disrupted by alterations in charged residues flanking 
the S/A domain, and proteins can be produced that adopt 
more than one orientation. However, our data do not fulfil 
the predictions of the charge difference rules (Hartmann 
et al., 1989), as only prptQlns containing mutations on the 
N-terminal side of the S/A (mutants 1, 2. and 6-9) were sig- 
nificantly inverted in the membrane and the topology of 
the mutants altered only on the C-terminal side of the S/A 
(mutants 3-5) remained largely unaltered. 

Biochemical Evidence for the Orientation 
of Charge-Altered HN Proteins 

It was inferred from the electrophoretic mobility of the 
Nexo protein that the C-terminal domain of these mole- 
cules, which contains the sites for N-linkecJ glycosylation. 
had not been transloQ^rted ^^c.rpss the ER membrane. 
However, it was important to provide evidence that the 
function of the S/A domain had riot been abrogated and 



Membrane Protein Orientation Signals 
779 



A. 





CHARGE* 




MUTANT 


ti 






— 


• 1 


0 


-1 


1 


- 1 


0 


* 1 


2 


-2 


0 


'2 


3 


* 1 


•2 


• I 


4 


* 1 




*2 


5 


♦ 1 


♦ 4 


*3 


6 


- 1 


♦3 


♦4 


7 


-1 


*A 


♦5 


e 


-2 


♦3 


♦5 


9 


-2 


•■4 


•5 



S/A 



B. 



MVAEOAPVRATCRVLFR tVX^^^^ ESLlTOKQtMSQAGSTG Q 
E-5CSSSSS 12 

— E — Etssssn 30 

— - — — ssssa — n-R— 5 

CSSSSSK-R • 5 

KSSSSIK-R-R 6 



E-S KSSSSa K-R 20 



E — S ESSiSSa K-R-R 



50 



■E— EKSSSaa K-R 50 

-E — E k - r - r 7S 



MUTANT 



WT1 23456 789 




Figure 1. Construction and Expression of Charge^ltered HN Proteins 

(A) Schematk: diagram of the charge-aJtered HN proteins. The 17 amino acid residues flanking the N- (left) and C-terminal (right) sides of the S/A 
(cross-hatched box) of WT HN are shown. Solid horizontal lines denote sequence Identity of mutants 1-9 with WT HN, and substHuttons are shown 
betow their position in the HN sequence, a: sum of charged residues within the 15 amino acids flanking the S/A domain; N, N-termina); C. C-termina). 
b: difference in the sum of charged residues on N- and C^ermlnal sides of S/A. c: percentage of the total HN protein accumulated In the un> 
glycosylated Noo form after a 1 hr labeling period. 

(B) Expression of charge-altered HN proteins. CV-i cells infected with vaccinia virus vrF7-3 were transf acted with plasmids encoding V/T HN or mu- 
tants 1-9 and radiolabeled for l hr with T^an[^S]label. Proteins were immunopreclpitated from cefl lysates with HN antlsera and analyzed by 
SOS-PAGE. Ncyi and N«xo denote forms of HN as described in the text. 



Cell 
780 



A. 



WT 2 6 7 

P S(P S|P SIP S| 



B. 



WT 2 7 

- + ! - + I - + ( 



N 



exo 



Figure 2. Biochemicat Analysis of Microsomal Membranes from Cells Expressing Charge-Altered HN Proteins 

Vaccinia virus vTF7-3-{nfected cells were transfected with plasmids encoding WT HN or with mutants 2, 6, or 7. Cells were radiolat>eled with 
Tran[^S]label from 3.5-4^ hr posttransfection, and crude microsomal membranes were prepared. 

(A) Alkali fractionation. Microsomal membranes were incubated for 30 min at 4**C with buffer (pM 11) and fractionated by centrlfugation. Equal portions 
of the resuNing pellet (P) or supernatant (S) were neutralized, immunoprecipltated with HN antisera, and the polypeptides were analyzed by 
SDS-PAGE. 

(B) Protease digestion. Samples were treated with buffer (- lanes) or with 20 M^g/ml trypsin (+ lanes). After 45 min at 37^C, microsomal membranes 
were isolated by centrifugatlon, and the proteins were immunoprecipitated with HN antisera before analysis by SDS^PAQE. Ncyi and Nqxo are forms 
of HN as described in the text. 



that these unglycosylated molecules were stably an- 
chored in the membrane (NexoCcyt orientation) and were 
not soluble cytoplasmic proteins. Microsomal membranes 
were prepared from vTF7-3-infected cells that had been 
transfected with plasmids encoding WT HN or mutants 2, 
6, Of 7. and the microsomes were treated with pH 11 buffer. 
Under these conditions, integraJ membrane proteins re- 
main associated with the lipid bilayer and after centrifuga- 
tlon are found in the pellet fraction, while soluble proteins 
are found in the supernatant fraction (Stock and Yu, 1973). 
As shown in Rgure 2A, both the Ncyt and the Naxo protein 
species fractionated like WT HN. as the majority of the 
protein was detected in the pellet fraction (P) and only 
trace amounts were found in the supernatant (S). Thus, 
these data strongly suggest that the function of the S/A do- 
main in targeting the proteins to the ER and anchoring the 
proteins In membranes had not been affected. 

To provide direct biochemical evidence for the topology 
of the mutant proteins, microsomal membranes isolated 
from vaccinia vTF7-3-inf acted cells expressing WT HN or 
several representative mutants were treated with trypsin, 
and the protected protein fragments were analyzed by im- 
munoprecipitation with HN antisera and SDS-PAGE. 
Microsomal membranes from celts expressing ,WT HN or 



mutants 2 and 7 protected the Ncyt species from tryp- 
sin digestion, whereas the N^xo form was accessible to 
added protease (Figure 2B, + lanes). These results sug- 
gest that the Ncyt species has a type II orientation and 
that the vast majority of the Nexo polypeptide chain Is lo- 
cated on the cytoplasmic side of the membrane. 

To provide evidence that the N-terminal domain of the 
HN Nexo species was translocated across the ER mem- 
brane and not held in a loop formation, a site for the addi- 
tion of N-l inked glycosylation was added to the N-terminal 
domain of WT HN anc^two of the charge-altered mutants 
by site-specific mutagenesis (Figure 3). It was anticipated 
that glycosylation of the N-termtnal domain of the Nexo 
species would result in a slower electrophoretic mobility 
than the unglycosylated Naxo protein, while the mobility of 
the Ncyt species would not' be altered. Vaccinia virus 
vrF7-3-infected cells were transfected with plasmids en- 
coding these N-terminal mutants and labeled for 1 hr with 
Tran[^S]label. Proteins were immunoprecipitated from 
celt extracts, incubated with (+) or without (-) N-gtycan- 
ase, and examined by SDS-PAGE. The mutant HN WT* 
contains the new N-ter/ninal for NHinked glycosyla- 
tion. and expression of HN' WT* results in the synthesis 
of a single major polypeptide (Figure 3, WT* lanes). Thus. 



Membrane Protein Orientation Signals 
781 



WT* 1* 2* 10* 11* 12* 13* 



I - + |_ +|, +|. +|. + 



+ I - + I 



N 
N 



Cyl 



Rgure a Glycosylation of the Mutant HN N^er- 
minal Domains 

VSacclnia vrF7-3-mfectod CV-t cells were trans- 
fected with plasmld DMAs er>coding derWa- 
tlves of the vyrr and mutant HN proteins altered 
to contain an N-terminaJ glycosytation site ( * ). 
Polypeptides were radiolabeled from 3l5-45 hr 
posttransfection with Tranpsjlabel and immu- 
nopreclpitated with HN antisera. Immune com- 
plexes were divided into two portions, incu- 
bated with (+) or without (-) N-gtycanase, and 
the polypeptides were analyzed by SDS-PAQE. 
The fraction of the total HN protein in the 
orientation is shown (% N^xo)- The N-terminal 
amino acids In the mutants are listed with solid 
horizontal lines. Indicating sequence identity 
with HN WT'. Note that HN WT" contains two 
extra N-termlnal residues (N and T) to create 
the site for N-IInked glycosylation. S/A. HN sig- 
nal/anchor domain. 



Mutant 



WT 



5/A 

MVNATEOAPVRATCRVi. FR S2^SS53- 



% Nexo 



JO 



I 1 



12 



13 



C-5 k\\\\Vs> 



VWXXXN 



E — E 

E LWXVSXV 

k: tsvsxwv - 

0 SSS5SSS3- 



10 

30 

10 

0 

10 

10 



the addition of the two new amino acid residues to form 
the N-terminal glycosylation site did not influence HN 
orientation. Two polypeptide species were identified with 
HN mutants V and 2* (- tanes, Ncyt and Nexo). both of 
which had a slower electrophoretic mobility than the sin- 
gle polypeptide species found after treatment of the pro- 
teins with N-glycanase (+ lanes). The small mobility dif- 
ference of ~5 kd between the Nexo species (- lanes) and 
the deglycosylated protein (+ lanes) suggested that the 
Nexo polypeptides had been modified by the addition of 
ceu'bohydrate and the shift in mobility is consistent with 
the use of the new N-terminal glycosylation site. Further- 
more, the relative abundance of the singly glycosylated V 
and 2* Nexo forms (10% and 30%) correlates well with the 
amount of their unglycosylated HN counterparts seen in 
Figure 1 (12% and 30%). Taken together, these biochemi- 
cal data indicate that the mutant New species represents 
an integral membrane protein with a large C-terminal cyto- 
plasmic region and a small N-terminal domain in the ER 
lumen, and thus these molecules are the result of a bona- 
fide inversion of the HN type II topology. 

Additional HN mutants (Figure 3, 10* -13') were con- 
structed to determine whether an arginine (R) residue 
directly flanking the HN N-terminal side of the S/A was re- 



quired for the establishment of the NcytCexo topology. The 
HN WT* cDNA was attered by mutagenesis such that a 
negatively charged glutamic acid (E), a positively charged 
lysine (K), an uncharged glutamine (Q), or a histidine (H) 
residue, the latter which can be weakly positively charged 
depending on the intracellular pH, replaced the R residue 
that normally flanks the S/A domain. These mutants were 
expressed from plasm ids and analyzed as described 
above for mutants HN WT* , 1* , and 2* . As shown in Fig- 
ure 3, approximately 10% of the mutants containing the E. 
Q, or H substitution were found in the Nexo form* and this 
value was very similar to that obtained with the 1* mutant. 
Expression of the K substitution mutant 11* (lane 11*) 
resulted in a polypeptide mobility pattern that was indistin- 
guishable from that of HN WT* , indicating that a positively 
charged residue directly flanking the N-terminal side of 
the S/A is very important for establishing the HN type II 
topology. 

The HN N-Terminal Domain Directs the Inversion 
of IM2 into a lype il Topology 

The above data suggest that the HN N-terminal domain 
plays a critical role in governing the type II orientation and 
that this region may contain a signal that Is incompatible 



Cell 
782 



A. 



Msg HgMM HgMM.1 
Ml- + 1 - + I - + I 



HgMM 
S P I S P 



Rgure 4. Transfer of the HN N-Terminal Do- 
main to Mj Results in a Type II Chimeric Poly- 
peptide 

{A) Expression of Mjg, HgMM, and HgMM.I. 
Vaccinia vTF7-3-lnfected CV-1 cells were trans- 
fected with plasmida encoding Mzg, HgMM, 
or HgMM.I and radtolabeled for 2 hr with 
pssimethionir»e and pS]cysteine. Samples 
were immunoprecipltated from cell extracts 
with M2 anttsera, incubated with (+) or without 
(-) N-gtycanase and analyzed by SOS-PAGE. 
Lane M, influenza A virus-Infected cell poly- 
peptides as a marker: the fastest-migrating 
species Is authentic M2 polypeptide. 
(B) Alkali extraction of membranes from cells 
expressing Mjg or HgMM. Vaccinia virus VTFT- 
infected cells were transfected with plas- 
mlds encoding Mjg or HgMM and were radio- 
labeled for 2 hr with [^Jmethionine and 
cysteine. Crude microsomal membranes were 
prepared, incubated with pH ^^J0 buffer, and 
fractionated by centrifugatton. Equal portions 
of the resulting supernatant (S) or pellet (P) 
fractions were immunopreclpitated with an- 
tisera, and the samples were examined by 
SDS-PAQE. The N-terminal amino acid se- 
quences of the mutants are listed with the 
HgMM.1 N^erminal horizontal line denoting 
sequence identity with HgMM (Hg is Identical 
to HN WT* ). S/A, M2 signalMnchor domain. 



Mutant 



% N 



S/A 



E— S 



100 



t^2Q MSNLTEVETP IRNEWGCRCNOSSD iPffiW i l 

HgMM MVNATEDAPVRATCRVLF REiMilSEl 8 

HgliM.l — 



60 



with its translocation across the mennbrane. A prediction 
of this proposal would be that the type III NexoCcyt -ori- 
ented M2 protein would iack this N-terminal retention sig- 
nal, but that transfer of the HN N-terminal domain to the 
M2 protein should direct an inversion of M2 to the type II 
topology. To test this prediction, a hybrid cDNA molecule 
was constructed (Figure 4. HgMM) such that It encoded 
the HN WT^ N-terminal 19 residues linked precisely to the 
M2 S/A and cytoplasmic domains. The addition of carbo- 
hydrate residues to this chimera would indicate that the 
HN N-terminal domain* which contains the only site for 
N-linked glycosylation, has been translocated across the 
ER membrane. Vaccinia virus vTF7-3-lnf acted cells were 
transfected with plasmids encoding the HgMM chimera or 
Mzg, a modified version of the M2 protein that contains an 
N-terminal site for N-linked glycosylation to facilitate the 
analysis of M2 membrane topology (Parks et al., 1989). 
The cells were labeled with [^SJcystelne and (^S]methi- 
onine for 2 hr, and the proteins were immunopreclpitated 
with M2 -specific antisera, incubated with (-f) or without 
(-) N-glycanase. and examined by SDS-PAGE. 

The M2g protein was synthesized as a major species 
(Figure A, M2g, - lane), which has a slower elec- 



trophoretlc mobility than the N-glycanase-treated protein 
(+ lane); this is consistent with the known NexoCcyt topol- 
ogy of M2 (Lamb et al.. 1985). The Msg protein was ob- 
served to migrate as a doublet; this may reflect differential 
processing of the carbohydrate residues. In contrast, only 
Q% of the HgMM protein was glycosylated and the vast 
majority of the HgMM protein was synthesized as an un- 
glycosylated polypeptide (HgMM. - lane) exhibiting an 
electrophoretic mobility indistinguishable to that of the 
N-glycanase-treated sample (+ lane). Alkali extraction of 
microsomal membranes isolated from celts expressing 
M2g or HgMM showed that both of these polypeptides 
were strongly associated with the membrane, as they 
were found in the pellet fraction tind not in the supernatant 
(Figure 4B). Thus, these data- indicate that the vast ma- 
jority of the chimeric HgMM protein Is orientated as a type 
II protein. Parenthetically, the observed type II topology of 
HgMM differs from that predicted by the charge difference 
rules (Hartniann et al.. 1989), as the sum of the charges 
flanking the HgMM S/A on the N- and C-terminal sides are 
+1 and +1.5, respectively. 

The results obtained with HN'rriutants 10* -13* indicate 
that a positive charge immediately flanking the HN S/A is 



Membrane Protein Orientation Signats 
783 



MgHH 
- + 



MgHH.1 

- + 



MgHH. 2 
- + 



I 



• 



Figure 5. Positive N-Termtnat Charges Convert 
the MgHH Chimeric Protein to a Type It Orien- 
tation 

CV-1 cells infected with vaccinia vrF7-3 were 
transfected with plasmid DNA encoding Mg- 
HH, MgHH.1. or MgHH.2 and radiolabeled for 
1 hr with Tranl'^Sl label. HN proteins were inv 
munoprecipitated from cell extracts with HN 
antisera, incubated with (-f) or without (-) 
N-gtycanase, and the polypeptides were exam- 
ined by SDS-PAGE. The N-terminal amino 
acids in the mutants are listed with hori^ntal 
lines denoting sequence identity with Mgg. 
S/A, HN signal/anchor domain. 



Mutant % N^xo 

S/A 

MqHH hSNLTEVETP I RNEWGCRCNOSSD EVVkXVVil 60 



MgHH. I R-ESSSSa 40 

MgHH.2 G — 



important for estabtishing a type II orientation. A charge- 
altered form of the HgMM chimera (HgMM.1, Figure 4) 
was constructed to test whether a positive charge flanking 
the N-terminal side of the S/A was also a factor in estab- 
lishing the HgMM NcytCexo topology. The HgMM.1 mu- 
tant, which encoded the same L to E and R to S mutations 
previously analyzed in HN mutant 1* » was expressed and 
analyzed as described above for M2g and HgMM. As 
shown in Figure 4, this charge-altered chimera (HgMM.1, 
- lane) was synthesized as a mixture of a slow-migrating 
glycosylated form and a faster-migrating species with a 
mobility matching that of the N-^lycanase-treated sample 
(+ lane). In contrast to the predominant type II orientation 
of the unaltered HgMM hybrid, approximately 60% of the 
modified HgMM.1 protein was found to be glycosylated 
and thus must be in an NoxoCcyt topology. Thus, these 
data suggest that the HN N-terminal region can direct an 
inversion of the M2 polypeptide from the NexoCcyt topol- 
ogy to that of a type II protein, and this efficient inversion 
is disrupted by the removal of N-terminal positively 
charged residues. 

Ability of the M2 N-Terminal Domain to Direct 
the Type il Topology 

We were interested in determining if the reciprocal experi- 
ment to that described in the section above could be per- 
formed, i.e., to convert a type III domain into a type 
II Neyt domain by the addition of positively charged 
residues. We have previously described the properties of 



a chimeric M2/HN protein containing the M2g N-terminal 
24 residues linlced precisely to the HN S/A and C-terminal 
domains (Parks et al., 1989). This MgHH hybrid is synthe- 
sized as a single polypeptide chain that adopts two oppos- 
ing orientations in membranes, with approximately 60% of 
the protein in the faster-migrating form (Figure 5, 
MgHH panel). Minor faster-migrating species are degra- 
dation products of the HN ectodomain (Parks et at., 1989; 
Ng et al., 1989). The effect of the addition of positively 
charged N-terminal residues on the orientation of this hy- 
brid was examined by constructing two charge-altered 
MgHH mutants. 

In MgHH.1, a single R residue was substituted for the 
M2g N-terminal serine at amino acid residue 23, and 
MgHH.2 encoded a substitution of the two negatively 
charged M2 aspartic acid (D) residues at positions 21 and 
24 with glycine (G) and arginine (R), respectively (Figure 
5). The rationale for the addition of a G residue at position 
21 in MgHH.2 was based on the finding that this was a nat- 
urally occurring change in the N-terminal ectodomain be- 
tween the M2 proteins of the Udorn/72 and PRy8/34 
strains of influenza A virus (Lamb and Lai, 1981; l^mb et 
al., 1985). Thus, it is known that a D to G substitution at 
this position does not alter the M2 protein orientation but 
would contribute generally to the N-terminal charge distri- 
bution. CV-1 cells infected with vaccinia \^F7-3 were trans- 
fected with plasmids encoding the altered MgHH hybrids 
and labeled for 1 hr with TranpSllabel. Proteins were im- 
munoprecipitated with HN antisera, incubated with (+) or 



Cetl 
784 



without (-) N-glycanase, and analyzed by SDS-PAGE. As 
shown In Figure 5, the MgHH.1 mutant was synthesized 
as two major species (Figure 5. panel MgHH.1, - lane) 
that were converted to a single faster-migrating form after 
N-glycanase treatment (+ lane), and 40% of this modified 
chimera was in the Naxo orientation. In contrast, the 
MgHH.2 mutant that contained a positively charged argi- 
nine residue flanking the S/A was predominantly In the 
type II Ncyt orientation, with only 3% of the protein in the 
Nexo topology (Figure 5, panel MgHH.2, - lane). Thus, 
these data indicate that the addition of positively charged 
residues to the Mz N>terminal ectodomain next to the S/A 
domain alters this region such that it can adopt a type II 
topology. 

Discussion 

We wished to test the role of charged residues flanking the 
S/A domain in determining orientation since the biochemi- 
cal mechanism involved in generating the topology of eu- 
karyotic membrane proteins with an internaJ uncleaved 
S/A has not been established. For the purposes of discus- 
sion the boundaries of the S/A domain are defined as the 
first charged residue in both directions from the middle 
of the first hydrophobic domain. The signals directing 
the orientation of proteins in the ER membrane can be 
thought of in simple terms as either acting to promote the 
translocation of the N-terminus of a type III protein across 
the membrane, acting to retain the N-terminus of type II 
proteins in the cytoplasm, or both signals could exist with 
one being dominant. Our data emphasize the importance 
of N-terminal positive charges in generating the type II 
orientation. Removal of positively charged residues from 
the Ncyt domain resulted In some of the HN molecules as- 
suming an inverted orientation in membranes. However, 
as the inversion was not absolute it suggests that the ab- 
sence of a positively charged residue is not the sole factor 
involved in generating the type III orientation. In part, the 
mixed orientation of the chimeric proteins (i.e., MgHH) be- 
fore the charges were altered may reflect difficulties in- 
volved with using chimeric proteins rather than naturally 
existing proteins. Interestingly, the addition of charges to 
the C-terminal side of the HN S/A domain in the absence 
of the N-termlnal positive charge residue resulted in more 
efficient inversion as discussed further below. Previously 
it has been found that the addition of N-terminal positively 
charged residues inverts the type III cytochrome P450 
protein but because of exposure of a cryptic site for cleav- 
age by signal peptidase it becomes a secreted protein 
(Szczesna-Skorupa et al., 1988; Sato et at., 1990). In addi- 
tion, it has been found that by switching domains in chi- 
meric proteins, which leads to alterations in the positions 
of charged residues, membrane topology can be altered 
both in vitro and in vivo (Haeuptle et al., 1989; Parks et al.. 
1989). 

We favor the idea that the N-terminal positively charged 
residue flanking the S/A domain is £in important part of a 
domlnantty' acting retention signal that retains the N-ter- 
minus of the nascent polypeptide chain in the cytoplasm 
to create the type II orientation, and that this retention sig- 



nal in not present in the N-terminal domain of type 111 pro- 
teins. This conclusion Is based on several lines of evi- 
dence in addition to the data obtained with the N-terminal 
charge-altered mutants. First, linking of the HN N-terminal 
domain to the M2 S/A and C-terminal regions produces a 
chimeric protein (HgMM) that largely adopts the HN topol- 
ogy, indicating that the dominant determinant of type 11 to- 
pology had been transferred to M2, and that this HN sig- 
nal could efficiently override any pK)ssible topological 
signals in the M2 S/A smd cytoplasmic domains. Second, 
the M2 N-terminal ectodomain although only 60% effi- 
cient at directing the chimera MgHH into the type III orien- 
tation can be altered to efficiently direct the MgHH chi- 
mera Into the type II orientation when positive charges are 
introduced into the N-terminal S/A flanking positions. This 
suggests that the nature of the sequence that comprises 
a cytoplasmic domain is less critical for generating type 
II topology than the presence of the appropriately posi- 
tioned positively charged residue, and that it is possible 
to create the signal that specifies type II topology. 

These data support the "positive inside** rule proposed 
previously (von Heijne and Gavel, 1988) and for which evi- 
dence has recently been provided in the case of a bac- 
terial membrane protein (Nilsson and von Heijne, 1990) in 
that positive charges are an important factor directing HN 
membrane topology. However, the orientation of the HN 
protein is more sensitive to the removal of N-terminal posi- 
tive charges than to the addition of C-terminal positive 
charges, and this indicates that the topology of eukaryotic 
type 11 proteins is not determined simply by the retention 
of the most positively charged domain. Once the N-ter- 
minal positive charge has been removed, the subsequent 
addition of positive charges to the C-termtnal side of the 
S/A may operate to keep this domain in the cytoplasm (Fig- 
ure 1, mutsmts 6-9). Thus, eukaryotes and prokaryotes 
may share a common mechanism for generating mem- 
brane protein topology by which charged residues provide 
a barrier to translocation, but their mechanisms may differ 
from each other in the relative importance of N-terminal 
positive charges. 

It was originally suggested on theoretical grounds, and 
then supported experimentally, that the signal sequence 
of type I and II proteins is inserted into the ER membrane 
as a hairpin loop with t>oth the N- and C-terminal regions 
located in the cytoplasm (von Heijne and Blomberg, 1979; 
Inouye and Halegoua, 1980; Engelman and Steitz, 1981; 
Shaw et al., 1988). As the jnsertion of type III membrane 
proteins into membranes is dependent on recognition of 
the S/A by the signal recognition particle (Hull et al., 1988). 
the nascent type III chain probably also adopts a loop 
structure. However, after memb/ane insertion as a hairpin 
loop, the critical step in generating the type III topology in- 
volves the translocation of the N-terminal domain across 
the lipid bi layer. The N- to C-terminal polarity of protein 
synthesis implies that the N-terminal region flanking the 
S/A of a nascent polypeptide would be exposed to the 
translocation machinery prior to complete exposure of 
the C-terminal flanking region, andit has. been suggested 
that the transfer of the type Iff NMterminal domain across 
the membrane may odCur very fast relative to the rate of 



MembfBne Protein Orientation Signals 
785 



translation (von Heijne. 1986b). In contrast, the presence 
of a positive-charge signal in the N-terminal region of the 
nascent polypeptide chain of type I proteins and the ma- 
ture polypeptide chain of type II proteins imparts cytoplas- 
mic retention of this domain and the C-terminal region is 
translocated. Although the topology of the immature type 
I and mature type II proteins may ultimately be dependent 
on the presence or at>sence of an available site for cleav- 
age by signal peptidase (Lipp and Dobberstetn, 1986a; 
Shaw et al., 1988), what distinguishes them from type III 
proteins is that during synthesis there is retention of the 
N-terminus In the cytoplasm. 

It is not known whether retention of the N-terminal do- 
main of nascent type I and mature type It proteins is due 
to binding by cytoplasmic factors or if a local electrical 
potential across the membrane makes translocation of 
this region thermodynamlcally unfavorable (Weinstein et 
al.. 1982). The translocation of a polypeptide chain into 
the ER could occur, at least in theory, by direct transfer 
through the hydrophobic environment of the lipid bilayer 
(von Heijne and Blomberg, 1979; Engelman and Steitz, 
1981) or through a protein pore in the membrane (Blobel 
and Dobberstein, 1975; Gilmore and Blobel, 1985). but re- 
cent evidence suggests that during translocation the na- 
scent chain is associated with distinct membrane-bound 
proteins (Connolly et al.. 1989; NIcchitta and Blobel, 
1989). In the case of prokaryotes, it has been suggested 
that the Escherichia coll SecA protein directs protein 
translocation by recognizing N-terminal positive charges 
within a signal sequence (Akita et al.. 1990), and it seems 
possible that an analogous protein may operate similarly 
in eukaryotes. The ability to reconstitute membrane trans- 
location in vitro from disrupted microsomes (Nicchitta and 
Blobel, 1990) may provide the means to separate and as- 
sess the rote of individual components of the translocation 
machinery in directing membrane topology. 

Experimental Procedures 

Plasmid Construction and SIte-SpeclfIc Mutagenesis 

cDNA clones that express wild-type SV5 HN (pSVl03HNm, MIebert et 
at., 1985a; Paterson et al., 1985) and Mjg, a derivative of influenza A 
virus M2 containing an N-terminal site for N-linked glycosylation 
(Parks et al., 1989). were used as a source of starting materials for the 
construction of the altered genes. Bacteriophage M13M2g (containing 
the entire Mrg cDNA in the Sam HI site of the repllcative form of 
M13mp19) and M13HN (containing 5' nucleotides 1-306 and encoding 
N-termlna) residues 1-61 from the HN gene) were used as template 
DNA for ollgonuclsotide-directed mutagenesis as described previ- 
ously (Parks et al., 1969). Oligonucleotides were synthesized by the 
Northwestern University Biotechnology Facility on a DNA synthesizer 
(Model 380B, Applied Btosystems Inc., Foster City, CA). 

Mutant HN DNA segments were excised from the replicattve form 
of Ml 3 by EcoRI and PstI digestion, subctoned into a pGem vector, and 
their nucleotide sequence confirmed by didaoxynudeotlde chaln- 
termlnattng sequencing (Sanger et al., 1977). The altered 5' end DNA 
fragments were then reconstructed Into a full-length gene by linkage 
to the HN Pstl-Xhol fragment (encoding residues 82-565) in pQem3 
such that mRNA sense transcripts could be generated using the bacte- 
riophage T7 RNA polymerase promoter and T7 RNA polymerase. 
pGem-HNWT*, which encodes an N-terminal site for N-linked glycosy- 
lation (AsrhAla-Thr), was constructed by the insertion of codons for Asn 
and Thr between HN bases 72-73 and 75-76, respectively. 

The HgMM gene was constructed by introducirtg a new StuI site 
(AGGCCT), which encodes the Junction of the HN N-terminal and M2 



S/A domains (Arg-Pro), into the HN WT* (bases 115-120) and Mj 
(bases 95-100) cONA fragments by oiigonudeotide-directed mutagen- 
esis. Blunt-end ligation of the EcoRI-StuI HN WT' fragment to the 
Stul-PsU M2 fragment In the EcoRI and Pstl sites of pGem3 yielded 
a ONA segment that encoded the HN WT* N-terminal residues 1-19 
linked precisely to the SM aruj C4ermlnal domains (residues 
25-52, Lamb et al., 1985). Slmilarty, HgMM.1 was constructed by blunt- 
end ligation of the HN mutant 1* EooRI-Scal and M2 Stul-PstI frag- 
ments into pQema The construction and characterizatkHi of MgHH 
has been described previously (Parks et al., 1989). MgHH.1 and 
MgHH. 2 were constructed by site^pecific mutagenesis as described 
(Parks et al., 1989). Nucleotide sequences were confirmed by dide* 
oxynudeotide chain-terminating sequencing (Sanger et al„ 1977). 

Cells 

Monolayer cultures of CV-1 cells were grown in Dulbecco^s modified 
Eagle's medium containing 10% fetaf calf serum as described (l-amb 
and Lai, 1982). 

Isotopic Labeling of Polypeptides* Immunoprectpltatlon, 
Paptlde:N-Glycoslda8e F Digestions, and 
Polyacrylamlde (>el Electrophoresis 

cDNA clones were expressed by a modification of the vaccinia vi- 
rusTT; RNA polymerase system of Fuerst et al. (1986). In brief, con- 
fluent monolayers of CV-1 cells (6 cm diameter plates) were Infected 
(multiplicity of Infection ^ 20) with recombinant vaccinia virus vTF7-3. 
which encodes the bacteriophage T7 RNA polymerase. The inoculum 
was removed and calcium phosphate-precipitated piasmid DNA (^vso 
^g) was then added. Cells transfected with plasmlds encoding HN mu- 
tants were radiolabeled from 3l5-4£ hr posttransf action with 20-50 
^Ct/ml Tran(^]label (ICN RadkKhemicals Inc., Irvine. CA) in Dul- 
beccG^s modified Eaglets medium lacking cysteine and methionine. Ra- 
diolabeled cells were washed in phosphate-buffered saline and lysed 
in 1% SDS. tmmunoprecipitation from celt extracts using polyclonal 
rabbit sera to denatured HN (HN antlsera) was as described (Ng et a).. 
1990; Erickson and Blobel. 1979). Cells trcmsfected with plasmlds en- 
coding Msg or the HN/M2 hybrids were redlolabeied from 3.5-5.5 hr 
posttransfection with a mixture of [^S]cysteine and [^Slmethionine 
(125 txCi/ml each), and the proteins were Immunopreclpltated from 
cells solubilized in cokJ RlPA buffer (Lamb et al., 1978) using polyclonal 
sera raised to denatured M2 (DM2 antlsera. I^mb et al., 1965). Treat- 
ment of samples with peptide: N-glycosidase F was as described 
previously (Williams and L^mb, 1986). Samples were analyzed by 
SDS-PAGE on 10% polyacrylamlde gels (HN proteins) or 17.5% gels 
containing 4 M urea (Mag and HN/M^ hybrid proteins), followed by flu- 
orography (Lamb and Choppin, 1976). Densitometrk; scanning of au- 
toradiograms was carried out using an LKB Uttrascan XL laser den- 
sitometer (Pharmacia-LKB, Bromma, Sweden), The %Ncxo values 
reported represent the average of at least two experiments. 

1>yp8ln Digestion and Alkali Extraction 
of Microsomal Membranes 

Vaccinia virus vrF7-3-infected cells were transfected with plasmid 
DNAs and radiolabeled from 3.5-4.5 hr post-DNA transfection with 20 
MCi/ml Tran[^S]lat>el (HN proteins) or from 3 to 5 hr posttransfection 
with 250 (iCi/ml PS]cysteine and [^Slmethlonine (M2g and HN/M2 
proteins) before the preparation of crude microsomal membranes by 
Dounce homogenlzatlon (Adams and Rose, 1985). Samples were ana- 
lyzed by trypsin digestion or alkali fracttonatlon as described previ- 
ously (Parks et al., 1989). 

Acknowledgments 

We thank Margaret Shaughnessy for excellent technical assistance 
and David Simpson for advise on the vaccinia virus expression system. 
We thank Bernard Moss and Thomas Fuerst for supplying vaccinia vi- 
rus vTFT^a^ Q. D. P. was supported by an American Cancer Society 
postdoctoral felkTwship (PF^ITT). This research was supported by 
Public Health Service research grants A1-2Q201 and At-23173 from the 
Natk>nal Institute of Allergy and Infectious Diseases. 

The costs of put^lication of this article were defrayed In part by the 
payment of page charges. This article must therefore be heretjy 



Cen 
786 



marked ^attvertisement' In accordance with 18 (JSC Section 1734 
solely to indicate this fact. 

Received October 17. 1990; revised November 19, 1990. 
References 

Adams, G. A., and Rose. J. K. (1985). Incorporation of a charged amino 
acid into the membrane spanning domain blocks cell surface transport 
but not membrane anchoring of a viral glycoprotein. Mol. Cell. Biol. 5. 
1442-1448. 

Akita. M., Sasaki, S., Matsuyama, S.-i., and Mizushima. S. (1990). 
SecA interacts with secretory proteins by recognizing the positive 
charge at the amino terminus of the signal peptide in Bscherichia colt. 
J. Biol. Chem. 265, 8164-8189. 

Blobel, G. (1980). Intracellular protein topogenesls. Proc. Natl. Acad. 
Sci. USA 77, 1496-1500. 

Blobel, G., and Oobbersteln. B. (1975). Transfer of proteins across 
membranes. I. Presence of proteolytlcalty processed and unprocessed 
nascent immunoglobulin light chains on membrane-bound ribosomes 
of murine myetoma. J. Cell Biol. 67, 835-851. 
Connolly. T, Collins. P., and Gilmore. R. (1989). Access of proteinase 
K to partially translocated nascent polypeptides in intact and detergent 
solubilized membranes. J. Cell Biol. 108, 299-308. 

Engelman, D. M.. and Steltz. T. A. (1981). The spontaneous insertion 
of proteins Into and across membranes: the helical hairpin hypothesis. 
Cell 23. 411-422. 

Erickson, A. H., and Blobel, G. (1979). Early evems in the biosynthesis 
of lysozomal enzyme cathe[>sin. J. Biol. Chem. 254, 11771-11774. 

Evans, E. A., Oitmore, R., and Blobel, G. (1986). Purification of micro- 
somal signal peptidase as a complex. Proc. Natl. Acad. Sci. USA 83, 
581-585. 

Fuerst, T R., Niles, E. G.. Studier, F. W.. and Moss, B. (1986). Eukary- 
otic transient-expression system based on recombinant vaccinia virus 
that synthesizes bacteriophage Tj RNA polymerase. Proc. Natl. Acad. 
Scf. USA $3, 8122-812& 

Qilmore, R., and Blobel, Q. (1985). Translocation of secretory proteins 
across the microsomal membrane occurs through an environment ac- 
cessible to aqueous perturbants. Cell 42, 497-505. 

Haeuptle. M.-T. Flint, N., Gough, N. M., and Dobberstein, B. (1989). A 
tripartite structure of the signals that determine protein insertion into 
the endoplasmic reticulum membrane. J. Cell Biol. 108, 1227-1236. 

Hartmann, E., Rapoport, T A., and Lodish. H. F. (1989). Predkting the 
orientation of eukaryotic membrane-spanning proteins. Proc. Natl. 
Acad. Sci. USA 86, 5786-5790. 

Hiebert. S. W., Paterson. R. G., and Lamb, R. A. (1985a). Hemag- 
glutinin-neuraminidase protein of the paramyxovirus simian virus 5: 
nucleotide sequence of the mRNA predicts an N-terminal membrane 
anchor. J. Virol. 54, 1-6. 

Hiebert, S. W., Paterson, R. Q., and Lamb, R. A. {1985b). Identification 
and predicted sequence of a previously unrecognized small hydropho- 
bic protein. SH, of the paramyxovirus simian virus 5. J. Virol. 55, 
744-751. 

High, S., and "fenner, M. J. A. (1987). Human erythrocyte membrane 
sialogtycoprotein ^. Biochem. J. 243, 277-280. 

Hull, J. D., Gilmore, R., and LBmb, R. A. (1988). Integratton of a small 
integral membrane protein, M3, of influenza virus into the endoplas- 
mic reticulum: analysis of the internal signal-anchor domain of a pro- 
tein with an ectoplasmic NH2 terminus. J. Cell Biol. 106, 1489-1498. 

Inouye, M., and Halegoua, S. (1980). Secretion and memt>rane local- 
ization of proteins in Escherichia coll. CRC Crit. Rev. Biochem. 7, 
339-371. 

Lamb. R. A., and Choppin, R W. (1976). Synthesis of Influenza virus 
proteins In infected cells: translation of viral polypeptides, including 
three P polypeptides, from RNA produced by primary transcription. 
Virology 74, 504-519. 

Lamb, R. A., and Lai. C.-J. (1981). Conservation of the influenza virus 
membrane protein (MO amino acid sequence and an open reading 



frame of RNA segment 7 encoding a second protein (M2) in HlNl and 
H3N2 strains. Virology 112, 746-751. 

Lamb, R. A., and Lai, C.-J. (1982). Spliced and unspliced messenger 
RNAs synthesized from doned influenza virus DNA in an SV40 vector: 
expression of the influenza virus membrane protein (Mi). Virology 
123, 237-25a 

Lamb. R. A., Etkind. R R., and Choppin, R W. (1978). Evidence for a 
ninth influenza vfral polypeptide. Virology 91, 60-7a 
Lamb, R. A., Zebedee, S. L., and Richardson, C. D. (1985). Influenza 
virus M2 protein is an integral membrane protein expressed on the 
infected-cell surface. Cell 40, 627-633 

Lipp, J., and Dobberstein, B. (1986a). The membrane-spanning seg- 
ment of invariant chain (ly) contains a potentially cleavable signal se- 
quence. Cell 46. 1103-1112. 

Lipp, J., and Dobberstein. a (1986b}. Signal recognition partlde- 
dependent membrane insertion of mouse invariant chain: a mem- 
brane-spanning protein with a cytoplasmlcalty exposed amino termi- 
nus. J. Cell Biol. 102, 2169-2175. 

Ng, D T. W.. Randall. R. E., and Lamb, R. A. (1989). Intracellular matu- 
ration and transport of the SV5 type II glycoprotein hemagglutinin- 
neuraminidase: specific and transient association with QRP78-BiP in 
the endoplasmic reticulum and extensive internalization from the cell 
surface. J. Cell Biol. 709. 3273-3289. 

Ng. D. T W.. Hiebert, S. W., and Lamb, R. A. (1990). Different roles of 
Individual N-linked oligosaccharide chains on the folding, assembly 
and transport of the simian virus 5 hemagglutlnlrvneuraminldase inte- 
gral membrane glycoprotein. Mol. Cell. Biol. 10, 1989-2001. 

NIcchitta, C. V. and Blobel, G. (1989). Nascent secretory chain binding 
and translocation are distinct processes: differentiation by chemical 
alkylation. J. Cell Biol. 108, 789-796, 

NIcchitta, C. v.. and Blobel, G. (1990). Assembly of translocation- 
competent proteollposomes from deterge nt-solubi I ized rough micro- 
somes. Cell 60, 259-269. 

Nllsson, I., and von Heljne, Q. (1990). Rne-tuning the topology of a 
potytopic membrane protein: rote of positively and negatively charged 
amino acids. Celt 62, 1135-1141. 

Parks, G. D., Hull, J. D., and Lamb, R. A. (1989). Transposition of do- 
mains between the M2 and HN viral membrane proteins results in 
polypeptides which can adopt more than one membrane orientation. 
J. Cell Biol. 709. 2023-2032. 

Paterson, R. Q., Hiebert, S. W., and Lamb, R. A. (1985). Expression at 
the cell surface of biologically active fusion and hemagglutinin-neu- 
raminidase proteins of the paramyxovirus SV5 from cloned cDNA. 
Proc. Natl. Acad. Sci. USA 82, 7520-7524. 

Sanger. F., NIckltn. S.. and Coulson, A. R. (1977). DNA sequencing 
with chain-terminating Inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463- 
5467. 

Sato, T. Sakaguchi, M,. Mihara, K.. and Omura, T (1990). The amino- 
terminal structures thai detemnine topological orientation of cytochrome 
P-450 in microsomal membranes. EMBO J. 9, 2391-2397. 

Schatzman, R. C, Evan, O. L, Privalsky, M. L., and Bishop, J. M. 
(1986). Orientation of the y-art-B gene product In the plasma mem- 
brane. Mol. Cell. Biol. 6, 1329-133a 

Schneider. C. Owen, M. J., Baovltle, D, and Williams, G. W. (1984). 
Primary structure of human transferrin receptor deduced from the 
mRNA sequence. Nature 377, 675-676 

Shaw, A. S.. Rottler, R J. M., and Rose, J. K. (1988). Evidence tor the 
loop model of signal-sequence insertion into the endc^tasmic reticu- 
lum. Proc. Natl. Acad. Sci. USA 85, 7592-7596. 

Spless, M., and Lodish, H. F. (198^). Sequence of a second human 
aslalogtycoprotein receptor: conservation of two receptor genes during 
evolution. Proc. Natl. Acad. Sci. USA 82, 646&-6469. 

Stock, T. L., and Yu. J. (1973). Selective solubilization of proteins from 
red blood cell membranes by protein perturbants. J. Supramol. Struct. 
7, 220-248. 

Strubln. M., Mach, B., and Long, E. O. (1984). The complete sequence 
of the mRNA for the HLA-DR-o&soclated tnvariant'chain reveals a poly- 
peptkie with an unusual transmemt>rana polarity. EMBO J. 3. 869-872. 



Membrane Protein Orientation Signals 
787 



Szczesna-Skonjpa. E.. Browne, N.. Mead, O., and Kemper, & (1388). 
Positive charges at the N-terminus convert the membrane-anchor sig- 
nal peptide of cytochrome P^50 to a secretory signal peptide. Proc. 
Natl. Acad. Sci. USA 85, 738-742. 

von Heljne, G. (lS86a). The distribution of positively charged residues 
in bacterial Inner membrane proteins correlates with the trans-mem- 
brane topology. EM BO J. 5. 3021-3027. 

von Heljne, G. (19d6b). Towards a comparative anatomy of N -terminal 
topogenic protein sequences. J. Mol. Biol. 789, 239-242. 

von Heijne, G. (1988). Transcending the Impenetrable: how proteins 
come to terms with membranea Biochim. Biophys. Acta 947, 307-333. 

von Heiine, G., and Blornberg, C. (1979). TlBns-membrane transloca- 
tlon of proteins: the direct transfer model. Eur. J. Biochem. 97; 17&-181. 

von Heijne, Q., and Qavel. Y. (1988). Topogenic signals in integral 
membrane proteins. Eur. J. Biochem. 174, 671-878. 

Wetter. P, and Lingappa. V. R. (1986). Mechanism of protein transloca- 
tion across the endoplasmic reticulum membrane. Annu. Rev. Cell 
Biol. 2. 499-516. 

Wei n stein, J. N.. Btumenthal, R., van Renswoude, J.. Kempf, C, and 
Klausner, R. 0. (1982). Charge clusters and the orientation of mem- 
brane proteins. J. Membr. Biol. 66. 203-212. 

Williams, M. A., and Lamb. R. A. (1988). Determination of the orienta- 
tion of an integral membrane protein and sites of gtycosylation by 
oligonucleotide-directed mutagenesis: influenza B virus NB glycopro- 
tein lacks a cieavable signal sequence and has an extracellular N-ter- 
minal region. Mol. Ceii. Biol. 6. 4317-432a 

Zerial, M., Huytebroeck. D., and Garoff, H. (1987). Foreign transmem- 
brane peptides replacing the internal signal sequence of transferrin 
receptor allow Its translocation and membrane binding. Cell 46, 
147-155. 



Exhibit 28 



Proc, NatL Acad. Set, USA 
Vol. 85. pp. 2444-2448. April 1988 
Biochemistry 

Improved tools for biological sequence comparison 

(aniino add/nudek add/daU base seardies/local similarity) 

William R. Pearson* and David J. LiPMANt 

*Departinent of Biochemistry. University of Virgima. Charlottesville. VA 22908; and ^Mathematical Research Branch. National Institute of Diabetes and 
Digestive and Kidney Diseases, National Institutes of Health. Bethesda. MD 20892 

Communicated by Gerald M, Rubin, December 2, 1987 (received for review September 17, 1987) 



ABSTRACT We have developed three computer pro- 
grams for comparisons of protein and DNA sequences. They 
can be used to search sequence data bases, evaluate similarity 
scores, and identify periodic structures based on local se- 
quence similarity. The FASTA program is a more sensitive 
derivative of the FASTP program, which can be used to search 
protein or DNA sequence data bases and can compare a 
protein sequence to a DNA sequence data base by translating 
the DNA data base as it is searched. FASTA Indudes an 
additional step in the calculation of the Initial pairwise simi- 
larity score that allows multiple regions of similarity to be 
Joined to increase the score of related sequences. The RDF2 
program can be used to evaluate the significance of similarity 
scores using a shuffling method that preserves local sequence 
composition. The LFASTA program can display all the re- 
gions of local similarity between two sequences with scores 
greater than a threshold, using the same scoring parameters 
and a similar alignment algorithm; these local similarities can 
be displayed as a "graphic matrix" plot or as individual 
alignments. In addition, these programs have been generalized 
to allow comparison of DNA or protein sequences based on a 
variety of alternative scoring matrices. 



We have been developing tools for the analysis of protein 
and DNA sequence similarity that achieve a balance of 
sensitivity and selectivity on the one hand and speed and 
memory requirements on the other. Three years ago, we 
described the FASTP program for searching amino acid 
sequence data bases (1), which uses a rapid technique for 
finding identities shared between two sequences and exploits 
the biological constraints on molecular evolution. FASTP 
has decreased the time required to search the National 
Biomedical Research Foundation (NBRF) protein sequence 
data base by more than two orders of magnitude and has 
been used by many investigators to find biologically signifi- 
cant similarities to newly sequenced proteins. There is a 
trade-off between sensitivity and selectivity in biological 
sequence comparison: methods that can detect more dis- 
tantly related sequences (increased sensitivity) frequently 
increase the similarity scores of unrelated sequences (de- 
creased selectivity). In this paper we describe a new version 
of FASTP, FASTA, which uses an improved algorithm that 
increases sensitivity with a small loss of selectivity and a 
negligible decrease in speed. We have also developed a 
related program, LFASTA, for local similarity analyses of 
DNA or amino acid sequences. These programs run on 
commonly available microcomputers as well as on larger 
machines. 

METHODS 

The search algorithm we have developed proceeds through 
four steps in determining a score for pair- wise similarity. 

The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. §1734 solely to indicate this fact. 



FASTP and FASTA achieve much of their speed and selec- 
tivity in the first step^ by using a lookup table to locate all 
identities or groups of identities between two DNA or amino 
acid sequences during the first step of the comparison (2). 
The ktup parameter determines how many consecutive iden- 
tities are required in a match. For example, if ktup = 4 for a 
DNA sequence comparison, only those identities that occur 
in a run of four consecutive matches are examined. In the 
first step, the 10 best diagonal regions are found , using a 
simple formula based on the number of ktup matches and the 
distance between the matches without considering shorter 
runs of identities, conservative replacements, insertions, or 
deletions (1, 3). 

In the second step of the comparison, we rescore these 10 
regions using a scoring matrix that allows conservative 
replacements and runs of identities shorter than ktup to 
contribute to the similarity score. For protein sequences, 
this score is usually calculated using the PAM250 matrix (4), 
although scoring matrices based on the minimum number of 
base changes required for a replacement or on an alternative 
measure of similarity can also be used with FASTA. For 
each of these best diagonal regions, a subregion with maxi- 
mal score is identified. We will refer to this region as the 
initial region**; the best initial regions from Fig. lA are 
shown in Fig. 1^. 

The FASTP program uses the single best scoring initial 
region to characterize pair- wise similarity; the initial scores 
are used to rank the library sequences. FASTA goes one 
step further during a library search; it checks to see whether 
several initial regions may be joined together. Given the 
locations of the initial regions, their respective scores, and a 
**joining" penalty (analogous to a gap penalty). FASTA 
calculates an optimal alignment of initial regions as a com- 
bination of compatible regions with maximal score. FASTA 
uses the resulting score to rank the library sequences. We 
limit the degradation of selectivity by including in the 
optimization step only those initial regions whose scores are 
above a threshold. This process caii be seen by comparing 
Fig. IB with Fig. IC. Fig. IB shows the 10 highest scoring 
initial regions after rescoring with the PAM250 matrix; the 
best initial region reported by FASTP is marked with an 
asterisk. Fig. IC shows an optimal subset of initial regions 
that can be joined to form a single alignment. 

In the fourih step of the comparison, the highest scoring 
library sequences are aligned using a modification of the 
optiniization method described by Needleman and Wunsch 
(5) and Smith and Waterman (6). This final comparison 
considers all possible alignments of the query and library 
sequence that fall within a band centered around the highest 
scoring initial region (Fig. ID). With the FASTP program, 
optimization frequently improved the similarity scores of 
related sequences by factors of 2 or 3. Because FASTA 
calculates an initial similarity score based on an optimization 
of initial regions during the library search, the initial score is 



Abbreviation: NBRF. National Biomedical Research Foundation. 



2444 



Biochemistry: Pearson and Lipman 



50 100 R 50 100 




C 50 100 0 50 100 




Fig. 1. Identification of sequence similarities by FASTA. The 
four steps used by the FASTA program to calculate the initial and 
optimal similarity scores between two sequences are shown. (A) 
Identify regions of identity. {B) Scan the regions using a scoring 
matrix and save the best initial regions. Initial regions with scores 
less than the joining threshold (27) are dashed. The asterisk denotes 
the highest scoring region reported by FASTP. (O Optimally join 
initial regions with scores greater than a threshold. The solid lines 
denote regions that are joined to make up the optimized initial score. 
(D) Recalculate an optimized alignment centered around the highest 
scoring initial region. The dotted lines denote the bounds of the 
optimized alignment. The result of this alignment is reported as the 
optimized score. 

much closer to the optimized score for many sequences. In 
fact, unlike FASTP, the FASTA method may yield initial 
scores that are higher than the corresponding optimized 
scores. 

Local Similarity Analyses, Molecular biologists are often 
interested in the detection of similar subsequences within 
longer sequences. In contrast to FASTP and FASTA, which 
report only the one highest scoring alignment between two 
sequences, local sequence comparison tools can identify 
multiple alignments between smaller portions of two se- 
quences. Local similarity searches can clearly show the 
results of gene duplications (see Fig. 2) or repeated struc- 
tural features (see Fig. 3) and are frequently displayed using 
a **graphic matrix'' plot (7), which allows one to detect 
regions of local similarity by eye. Optimal algorithms for 
sensitive local sequence comparison (6, 8, 9) can have 
tremendous computational requirements in time and mem- 
ory, which make them impractical on microcomputers and, 
when comparing longer sequences, on larger machines as 
well. 

The program for detecting local similarities, LFASTA, 
uses the same first two steps for fmding initial regions that 
FASTA uses. However, instead of saving 10 initial regions, 
LFASTA saves all diagonal regions with similarity scores 
greater than a threshold. LFASTA and FASTA also differ in 
the construction of optimized alignments. Instead of focus- 
ing on a single region, LFASTA computes a local alignment 
for each initial region. Thus LFASTA considers all of the 
initial regions shown in Fig. IB, instead of just the diagonal 
shown in Fig. ID. Furthermore, LFASTA considers not 



Proc. Natl. Acad, ScL USA 85 (1988) 2445 

only the band around each initial region but also potential 
sequence alignments for some distance before and after the 
initial region. Starting at the end of the initial region, an 
optimization (6) proceeds in the reverse direction until all 
possible alignment scores have gone to zero. The location of 
the maximal local similarity score in the reverse direction is 
then used to start a second optimization that proceeds in the 
forward direction. An optimal path starting from the forward 
maximum is then displayed (5). The local homologies can be 
displayed as sequence alignments (see Fig. 2B) or on a 
two-dimensional graphic matrix style plot (see Figs. 2A and 
3). 

Statistical Significance. The rapid sequence comparison 
algorithms we have developed also provide additional tools 
for evaluating the statistical significance of an alignment. 
There are approximately 5000 protein sequences, with 1.1 
million amino acid residues, in the NBRF protein sequence 
library, and any computer program that searches the library 
by calculating a similarity score for each sequence in the 
library will find a highest scoring sequence, regardless of 
whether the alignment between the query and library se- 
quence is biologically meaningful or not. Accompanying the 
previous version of FASTP was a program for the evaluation 
of statistical significance, RDF, which compares one se- 
quence with randomly permuted versions of the potentially 
related sequence. 

We have written a new version of RDF (RDF2) that has 
several improvements. (0 RDF2 calculates three scores for 
each shuffled sequence: one from the best single initial region 
(as found by FASTP), a second from the joined initial regions 
(used by FASTA), and a third from the optimized diagonal. 
00 RDF2 can be used to evaluate amino acid or DNA 
sequences and allows the user to specify the scoring matrix to 
be employed. Thus sequences found using the PAM250 
scoring matrix can be evaluated using the identity or genetic 
code matrix. (ii'O The user may specify either a global or local 
shufHe routine. 

Locally biased amino acid or nucleotide composition is 
perhaps the most common reason for high similarity scores 
of dubious biological significance (10). High scoring align- 
ments between query and library sequences may be due to 
patches of hydrophobic or charged amino acid residues or to 
A + T- or G + C-rich regions in DNA. A simple Monte Carlo 
shuffle analysis that constructs random sequences by taking 
each residue in one sequence and placing it randomly along 
the length of the new sequence will break up these patches of 
biased composition. As a result, the scores of the shufRed 
sequences may be much lower than those of the unshufHed 
sequence, and the sequences will appear to be related. 
Alternatively, shufHed sequences can be constructed by 
permuting small blocks of 10 or 20 residues so that, while the 
order of the sequence is destroyed, the local composition is 
not. By shuffling the residues within short blocks along the 
sequence, patches of G + C- or A + T-rich regions in DNA, 
for example, are undisturbed. Evaluating significance with a 
local shufHe is more stringent than the global approach, and 
there may be some circumstances in which both should be 
used in conjunction. Whereas two proteins that share a 
common evolutionary ancestor may have clearly significant 
similarity scores using either shuffling strategy, proteins 
related because of secondary structure or hydropathic pro- 
file may have similarity scores whose significance decreases 
dramatically when the results of global and local shuffling 
are compared. 

Implementation. The FASTA/LFASTA package of se- 
quence analysis tools is written in the C programming lan- 
guage and h£Ls been implemented under the Unix, VAX/ 
VMS. and IBM PC DOS operating systems. Versions of the 
program that run on the IBM PC are limited to query se- 



2446 Biochemistry: Pearson and Lipman 



Proc. Natl Acad. ScL USA 85 (1988) 



Table 1. FASTA and FASTP initiaJ scores of the T-cell receptor 
(RWMSAV) versus the NBRF data base 

Initial score 



NRRF rrwfe 


5Sef]iience 


FASTA 


FASTP 


RWHUAV 


T-cell receptor a chain 


155 


98 


KIHURE 


Ig K chain V-I region 


127 


111 


KVMS50 


Ig K chain V region 


149 


62 


KVMSM6 


Ig jc chain precursor V regions 


141 


64 


KVRB29 


Ig K chain V region 


126 


54 


L3HUSH 


Ig A chain V-I 1 1 region 


90 


47 


KVMS41 


Ig K chain precursor V region 


87 


87 


RWMSBV 


T-cell receptor /3-chain precursor 


94 


94 


RWHUVY 


T-cell receptor 0-chain precursor 


91 


59 


RWHUGV 


T-cell receptor y-chain precursor 


87 


61 


RWHUT4 


T-cell surface glycoprotein T4 


86 


63 


RWMSVB 


T-cell receptor y-chain precursor 


71 


41 


HVMS44 


Ig heavy-chain V region 


67 


36 


GIHUDW 


Ig heavy-chain V-II region 


62 


35 



The average FASTP score = 26.1 ± 6.8 (mean ± SD). The 
average FASTA score = 26.2 ±7.2 (mean ± SD). The mean and 
SD were computed excluding scores >54. V, Variable. 



quences of 20(X) residues; library sequences can be any 
length. Copies of the program are available from the authors. 

Although FASTA and LFASTA were designed for protein 
and DNA sequence comparison, they use a general method 
that can be applied to any alphabet with arbitrary match/ 
mismatch scoring values. All the scoring parameters, includ- 
ing match/mismatch values, values for the first residue in a 
gap and subsequent residues in the gap, and other parame- 
ters that control the number of sequences to be saved and 
the histogram intervals, can be specified without changing 
the program. 

EXAMPLES 

Comparison of FASTA with FASTP. To demonstrate the 
superiority of the FASTA method for computing the initial 
score, we compared the protein sequence of a T-cell receptor 
a chain (NBRF code RWMSAV) with all sequences in the 
NBRF protein data base^ and computed initial scores with 
both the present and previous methods. The T-cell receptor is 
a member of the immunoglobulin superfamily; in Release 12.0 
of the data base, this superfamily has 203 members. FASTP 
placed 160 immunoglobulin superfamily sequences in the 200 
top-scoring sequences; 57 related sequences received initial 
scores less than four standard deviations above the mean 
score. FASTA placed 180 superfamily members in the 200 
top-scoring sequences; only 20 related sequences scored 
below four standard deviations above the mean. Table 1 con- 
tains specific examples from this data base search. Although 
there is often little difference in the two methods, this ex- 
ample shows that in a number of cases the new method ob- 
tains significantly higher scores between related sequences. 

Nucleic Acid Data Base Search. FASTA can also be used to 
search DNA sequence data bases, either by comparing a 
DNA query sequence to the DNA library or by comparing an 
amino acid query sequence to the DNA library by translating 
each library DNA sequence in all six possible reading 
frames. We compared the 660-nucleotide rat transforming 
growth factor type a mRNA (GenBank locus RATTGFA) 
with all the mammalian sequences in Release 48 of Gen- 
BankS. We set ktup = 4 (see Methods), and the search was 
completed in under 15 min on an IBM PCAT microcom- 



^Protein Identification Resource (1987) Protein Sequence Database 
(Natl. Biomed. Res. Found.. Washington. DC), Release 12. 

§EMBL/GenBank Genetic Sequence Database (1987) (InteIHgenet- 
ics. Mountain View, CA), Tape Release 48. 



Table 2. DNA data base search of rat transforming growth factor 
(RATTGFA) versus mammalian sequences 



GenBank 






Score 


locus 


Sequence 


Initial 


Optimized 


HI IMTPHAM 


Human TGF mRNA 






HUMTOFA2 


Human Tr>F oen^ ^Rvrin 9^ 






HUMTGFAl 


Human TGF gene (5* end) 


224 


381 


MUSRGEB3 


Mouse 18S-3.8S-28S rRNA 


140 


107 




gene 






MUSRGE52 


Mouse 18S-5.8S-28S rRNA 


140 


107 




gene 






MUSMHDD 


MHC class 1 H-2D 


122 


78 


HUMMETIFl 


Metallothionein (MT)Ip gene 


116 


92 


MUSRGLP 


45S rRNA (5' end) 


115 


83 


HUMPS2 


pS2 mRNA 


105 


106 


MUSCIAII 


a-1 type I procollagen 


86 


89 



The 10 sequences having the highest initial scores are given. TGF, 
transforming growth factor; MHC, msyor histocompatibility com- 
plex. 



puter. The 10 top-scoring library sequences are shown in 
Table 2. Although it can be seen that the 3 top-scoring 
sequences are clearly related to RATTGFA, there are other 
high-scoring sequences that are probably not related, and the 
mouse epidermal growth factor, found in the translated data 
base search (Table 3), is not found among the top-scoring 
sequences. 

To further examine the similarity detected between RAT- 
TGFA and MUSRGEB3. a mouse rRNA gene cluster, we 
used the RDF2 program for Monte Carlo analysis of statis- 
tical significance (the window for local shuffling was set to 10 
bases). Of the 50 shuffled comparisons (data not shown), 1 
obtained an initial score greater than 140 (the observed initial 
score), and 9 shuffled sequences obtained optimized scores 
greater than 107 (the observed optimized score). Therefore, 
the similarity between RATTGFA and MUSRGEB3 is un- 
likely to be significant. 

Translated Nucleic Add Data Base Search. When searching 
for sequences that encode proteins, amino acid sequence 
comparisons are substantially more sensitive than DNA se- 
quence comparisons because one can use scoring matrices 
like the PAM250 matrix that discriminate between conserva- 
tive and nonconservative substitutions. A variant of FASTA, 
TFASTA, can be used to compare a protein sequence to a 
DNA sequence library; it translates the DNA sequences into 
each of six possible reading frames *'on-the-fly." TFASTA 
translates the DNA sequences from beginning to end; it 
includes both intron and exon sequences in the translated 
protein sequence; termination codons are translated into 
unknown (X) amino acids. Table 3 shows the results of a 
translating search of the mammalian sequences in the Gen- 
Bank DNA data base using the RATTGFA protein sequence 
as the query and ktup = 1. In the translated search, the mouse 
epidermal growth factor now obtains an initial score higher 
than any unrelated sequences; however, HUMTGFAl, which 
was found in the DNA data base search but only contains 13 
translated codons, is no longer among the top scoring se- 
quences. 

Local Similarities. Fig. 2 displays the output of a local 
similarity analysis (ktup = 4) of CHPHBAIM, a chimpanzee 
al-globin mRNA, and RABHBAFT, a rabbit a-globin gene, 
including the complete coding sequence and a flanking 
pseudo-^i-globin gene. LFASTA can either display a graphic 
matrix style plot of the local homologies (Fig. 2A) or the 
alignments themselves (Fig. IB). The right-most three align- 
ments (Fig. 2A) match the corresponding regions of the 
mRNA to exon subsequences from the pseudogene. We note 
that the FASTA initial score for the comparison of CHPH- 



Biochemistry: Pearson and Lipman 



Proc, Natl Acad, ScL USA 85 (1988) 2447 



Table 3. Translated DNA data base search of rat transforming growth factor (RATTGFA) versus 
mammalian sequences 



Score 



GenBank 
locus 


Sequence 


Frame 


Initial 


Optimized 


RATTGFA 


Rat TGF type a 


1 


816 


816 


HUMTGFAM 


Human TGF mRNA 


2 


671 


770 


HUMTGFA2 


Human TGF gene 


1 


204 


205 


MUSEGF 


Mouse EOF mRNA 


3 


93 


129 


MUSMHAB3 


Mouse MHC class 11 Hl-IA^ 


1 


91 


58 


MUSIGCD17 


Mouse Ig germ-line DJC region 


3' 


85 


48 


HUMESTR 


Human estrogen receptor 


3 


83 


65 


RATINSl 


Rat insulin 1 {Ins-l) gene 


2 


81 


63 


MUSTHYSl 


Mouse thymidylate synthase 


2 


80 


63 


HUMPNU3 


Human purine nucleoside phosphorylase 


r 


80 


52 



The 10 sequences having the highest initial scores are given. TGF, transforming growth factor; EOF. 
epidermal growth factor; D» diversity; J, joining; C. constant; MHC, major histocompatibility 
complex. 



BAIM and RABHBAPT would be based on the three globin 
gene exons, while the FASTP initial score would be based on 
a single conserved exon. 

The Smith-Waterman optimization used in the LFASTA 
program allows the detection of more subtle features than 
can be detected by the eye using a graphic matrix plot, 
because the path traced is locally optimal, even though it 
may only have a slightly higher density of identities and 
conservative replacements. Fig. 3 shows a plot from a local 
similarity self-comparison of the myosin heavy chain from 
the nematode Caenorhabditis elegans (MWKW) using the 
PAM250 matrix. The amino-terminal half of the molecule 
forms a large globular head without any periodic structure; 
the solid line down the main diagonal represents the ex- 
pected identity of the sequence with itself. The symmetrical 
parallel lines along the carboxyl-terminal half of the mole- 
cule correspond to the 28-residue repeat responsible for the 
a-helical coiled-coil structure of the rod segment. 

DISCUSSION 

In searching a data base, one is attempting to measure 
relatedness; in aligning two homologous sequences, one is 



trying to choose the most likely set of mutations since their 
divergence from a common ancestral sequence. Thus any 
tool for the analysis of sequence similarities must contain 
within it an implicit model of molecular evolution. An 
algorithm that guarantees the optimality of its alignments 
based on a set of scoring rules must be judged on how well 
these rules fit our current understanding of the process of 
molecular evolution. Algorithms that sacrifice realism to 
achieve greater efficiency, regardless of their mathematical 
rigor, require careful empirical evaluation. 

Even though the tools we have developed use rigorous 
algorithms at each step and incorporate a realistic model of 
evolution, their hierarchical nature make them heuristic. The 
original FASTP program has had the benefit of extensive use 
and evaluation by a wide variety of scientists. The FASTA 
program exploits refinements of the previous approach that 
result in a significant improvement in sensitivity. The LFA- 
STA local similarity analysis program is also a logical ex- 
tension of the FASTP approach. 

Because of the trade-offs between sensitivity and selectiv- 
ity in data base searches, the results of any search, and 
particularly those that result in alignment scores that are not 
clearly separated from the distribution of all library sequence 



500 



1000 1500 2000 2500 3000 3500 4000 




B 

10 20 30 40 50 60 

CHPHBA GACTCAGAAAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCG 

:::: ::: X ::::::::::::::::: : :: :::::::::::: ::::: : : 
RABHBA GACTGAGAAGGAA-CCACCATGGTGCTGTCTCCCGCTGACAAGACCAACATCAAGACTG 

160 190 200 210 220 

70 80 90 100 110 

CHPHBA CCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGG 

■ •«* •«••■ *•■>**•••••• mm ****** *■■<*■* #» 

RABHBA CCTGGGAAAAGATCGGCAGCCACGGTGGCGAGTATGGCGCCGAGGCCGTGGAGAGG 
230 240 250 260 270 280 

Fig. 2. Local comparison of an a-globin mRNA sequence with an a-globin gene cluster. An ape ai-globin mRNA sequence (GenBank 
sequence CHPHBAIM) was compared with a rabbit a-globin gene sequence (RABHBAPT) containing a second pseudo-^globin gene using the 
LFASTA program. {A) A plot of the homologous regions shared by the two sequences. {B) One of the alignments between the mRNA sequence 
and the rabbit a-globin gene (nucleotides 171-855). Three other alignments between the mRNA sequence and the a-globin gene and three 
alignments between the pseudo-9-globin gene (nucleotides 32(X)-3770) were calculated but are not shown. There is 84.3% identity in the 115 
nucleotide overlap. The initial region and optimized scores using LFASTA are 284 and 304, respectively. X denotes the ends of the initial region 
found by LFASTA. 



2448 Biochemistry: Pearson and Lipman Proc. NatL Acad, ScL USA 85 (1988) 




Fig. 3. Repeated structure in the 
myosin heavy chain. LFASTA was used 
to compare the Caenorhabditis elegans 
myosin heavy chain protein sequence 
(NBRF code MWKW) with itself using 
the PAM250 scoring matrix. The solid, 
dashed, and dotted lines denote decreas- 
ing similarity scores. The solid lines had 
initial region scores greater than SO and 
optimized local scores greater than 150; 
the longer dashed lines had initial region 
and optimized local scores greater than 
65 and 120, respectively, and the shorter 
dashed lines had initial region and opti- 
mized local scores greater than 50 and 
100, respectively. Homologous regions 
with lower scores are plotted with dots. 



scores, must be carefully evaluated (1, 11). The Monte Carlo 
analysis of statistical significance provided by a program 
such as RDF2 can often be critical in evaluating a borderline 
similarity. Previously we suggested ranges of z values [(ob- 
served score - mean of shufHed scores)/standard deviation 
of shuffled scores] corresponding to approximate signifi- 
cance levels. However the z values determined in a Monte 
Carlo analysis become less useful as the distribution of 
shuffled scores diverges from a normal distribution, as is 
found with PASTA. Therefore, we now focus on the highest 
scores of the shuffled sequences. For example, if in 50 
shuffled comparisons, several random scores are as high or 
higher than the observed score, then the observed similarity 
is not a particularly unlikely event. One can have more 
confidence if in 200 shufHed comparisons, no random score 
approaches the observed score. In general, our experience 
has led us to be conservative in evaluating an observed 
similarity in an unlikely biological context. 

These programs provide a group of sequence analysis 
tools that use a consistent measure for scoring similarity and 
constructing alignments. PASTA, RDF2, and LFASTA all 
use the same scoring matrices and similar alignment algo- 
rithms, so that potentially related library sequences discov- 



ered after the search of a sequence data base can be 
evaluated further from a variety of perspectives. In addition, 
LFASTA can also show alternative alignments between 
sequences with periodic structures or duplications. 

1. Lipman, D. J. & Pearson, W. R. (1985) Science ITf, 1435- 
1441. 

2. Dumas, J. P. & Ninio, J. (1982) Nucleic Acids Res, 10, 
197—206 

3. Wilbur, W. J. & Lipman, D. J. (1983) Proc. Natl. Acad, Sci. 
USA 80, 726-730. 

4. Dayhoff, M., Schwartz, R. M. & Orcutt, B. C. (1978) in Atlas 
of Protein Sequence and Structure, ed. Dayhoflf, M. (Natl. 
Biomed. Res. Found.. Silver Spring, MD), Vol. 5, Suppl. 3, 
pp. 345-352. 

5. Needlcman, S. & Wunsch, C. (1970) J. MoL BioL 48. 444-453. 

6. Smith, T. & Waterman, M. S. (1981) J, Mol. BioL 147. 
195-197. 

7. Maizel, J. & Unk, R. (1981) Proc. Natl, Acad. Sci. USA 78, 
7^5—7^9 

8. Goad, W. & Kanehisa. M. (1982) Nucleic Acids Res. 10, 
247-263. 

9. Sellers, P. H. (1979) Proc. NatL Acad, ScL USA 76, 3041. 

10. Lipman. D. J., Wilbur. W. J., Smith, T. F. & Waterman, 
M. S. (1984) Nucleic Acids Res, 12, 215-226. 

11. Dooliltle, R. (1981) Science 214, 149-159. 



Exhibit 29 




Volume 13 Number 4 



August 1997 



When you scan or read this material 
please cross off the next number. 

-]M(\4 5 6 7 8 9 10 11 12 13 
i4 15 16 17 18 19 20 21 22 23 



OXFORD 
UNIVERSITY 



ISSN 0266-7061 



— m^r^m Mar^MMM Vol. 13 no. 4 1997 

CABIOS INVITED REVIEW Pages325-332 



Identifying distantly related protein 
sequences 



William R.Pearson 



Introduction 

The most powerful method available today for inferring the 
biological function of a gene (or the protein that it encodes) 
from its sequence is similarity searching on protein and DNA 
sequence databases. With the development of rapid methods, 
for sequence comparison, both with heuristic algorithms and 
powerful parallel computers, discoveries based solely on 
sequence homology have become routine. Indeed, the vast 
majority of the gene identifications in the recent descriptions 
of the Haemophilus influenzjae (Heischmann et aL, 1995), 
Mycoplasma genitalium (Fraser et a/.. 1995), yeast (Dujon. 
1996) and Methanococcus janesscii (Bult et aL, 1996) 
genomes are based only on protein sequence similarity. As 
more complete genomes become available, protein sequence 
comparison wiU become an even more powerful tool for 
understanding biological function. 

Protein sequence comparison is a powerful tool because of 
the enormous amount of information that is preserved 
throughout the evolutionary process. For many protein 
sequences, an evolutionary history can be traced back 1- 
2.5 bUlion years. Proteins that share a common ancestor are 
called homologous. Sequence comparison is most informa- 
tive when it detects homologous proteins. Homologous 
proteins always share a common three-dimensional folding 
structure and they often share common active sites or binding 
domains. Frequently, homologous proteins share conmion 
functions, but sometimes they do not. Our abiUty to 
characterize the biological properties of a protein based on 
sequence data alone stems ahnost exclusively from properties 
conserved through evolutionary time. Predictions of common 
properties for non-homologous proteins — sinularities that 
have arisen by convergence — are much less reliable. 

While sequence similarity searching is a routine method 
for characterizing newly determined DNA and protein 
sequences, researchers sometimes fail to exploit fully the 
information that is available from similarity searches of 
protein sequence databases. This review examines two 
strategies for using similarity search information more 
effectively: (i) looking for alignments that span an entire 
folding domain, rather than a short sequence motif, and (ii) 



Department of Biochemistry, Jordan Hail ^f440. University of Virginia. 
Charlottesville. VA 22908, USA 
E-mail: wrp@virginia.EDU 



re-examining sequences with high, but not siatisticaUy 
significant, similarity scores. For a broader perspecuve on 
sequence comparison and identification of homologous 
proteins, see Altschul et al. (1994) and Pearson (1996). 

Members of the trypsin-like serine protease superfanuly 
Ctrypsin-like' distinguishes these serine proteases from other 
serine protease families— notably the subtilisins— tiiat use 
serine in the active site but have very different strucmrcs and 
thus are not homologous) provide a classic example of a 
family of proteins with a highly conserved active site. While 
highly conserved motifs from tiiis site arc informative, senne 
proteases share similarity throughout the length of the 
protease domain, not just around the active site residues. 

The trypsin-like serine protease family is quite diverse, 
with a number of very distantiy related homologues. Thus, it 
can be difficult to demonstrate that Streptomyces gnseus 
protease A and protease B are homologous based on sequence 
similarity alone. The second part of this review shows that by 
carefully re-examining sequences with high-scoring, but not 
statistically significant, similarity scores, it is possible to 
identify several proteins that share significant sunilanty with 
both the mammalian trypsin-like serine proteases and their 
distant prokaryotic homologues. 

Motifs, homology, and the serine proteases 

A common misconception in protein sequence comparison is 
that homologous proteins share sequence similanty mostiy 
(or only) near the active site regions or other functional 
domains in a protein. This partiy accounts for the popularity 
of databases of sequence motifs, such as PROSITE (Bairoch, 
1991) which tabulate amino acid patterns tiiat can be used to 
identify most of the members of a protein fanuly . For features 
Uiat result from convergence to a common property, such as 
glycosylation and phosphorylation sites, sequence motifs are 
uniquely mformative. However, for features that result from 
divergence from a common ancestor, such as die serme 
protease active site residues, sequence motifs provide only a 
highly abstracted summary of the sequence conservation in a 
family. Because they share a common tiiree-dimensional 
structure, homologous proteins share sequence siimlarity 
over large rcgions-typically the entire protein fold. 

The trypsin-like serine protease superfamily is a classic 
example of a protein family whose members share several 
simple motifs that arc diagnostic for the family (Figure 1). 



© Oxford University Press 



325 



WJLPearsoa 



ID 
AC 
DE 
PA 
NR 
NR 
CC 
CC 

ID 
AC 
DE 
PA 
NR 
NR 
CC 
CC 



TRYPSIN.HIS; PATTERN. 
PS00134; 

Serine proteases, trypsin family, histidine active site 
[LIVM] - [ST] -A- [STAG] -H-C. 

/TOTAL=158(158) ; /POSITIVE=154 (154 ) ; /UNKN0WN=2 (2 ) ; /FALSE POS=2(2)- 
/FAIjSE__NEG=11 (11) ; 

/TAXO-RANGE=??EP?; /MAX-REPEAT=1 ; 
/SITE=5,active_site; . 

TRYPSIN.SER; PATTERN. 
PS00135; 

Serine proteases, trypsin family, serine active site 
G-D-S-G-G. 

/TOTAL=160(160) ; /P0SITIVE=151 (151) ; /UNKN0WN=1 (1) ; /FALSE POS=8(8)- 
/FALSE_NEG=16(16) ; - ovo;, 

/TAXO-RANGE=??EP?; /MAX-REPEAT=1 ; 
/SITE=3 , active_site; 



Fig. 1. Patterns for serine proteases. Patterns from PROSrUE that idenUfy 152/J63 TRYPSIN.HIS or 143/159 TRYPSIN SER members of the trypsin-Iike 
serine protease protein family. ~ -"^ 



Serine proteases cleave peptide bonds using a 'catalytic triad' 
of histidine, serine and aspartic acid that arc required for the 
protease function. Because these residues are so highly 
conserved, patterns that focus on two of the regions (Figure 1} 
can be used to identify every member of the serine protease 
family. (The subtilisin-like serine proteases use exactly the 
same catalytic triad, but the families are non-homologous 
with very different three-dimensional structures.) 

Most members of the trypsin-like serine protease super- 
farafily are readily identified by sequence similarity searching. 
The results from a typical protein database search using the 
Smith- Waterman algorithm (Smith and Waterman, 1981) are 
shown in Figure 2. All of the eukaryotic trypsin-like serine 
proteases share statistically significant similarity with the 
bovine trypsin query sequence. However, as is often the case 
for divergent protein families, some prokaryotic members of 
the family do not share statistically significant similarity with 
bovine trypsin. These sequences are italicized in Figure 2; 
their membership in the serine protease family is usually 
inferred from their common three-dimensional structures 
(Figure 5). 

This absolute conservation of residues in the *catalytic 
triad* might suggest that sequence similarities shared by 
members of this family are limited to those regions. Indeed, 
two of the four *High-Scoring segment Fairs* (Altschul et a/., 
1994) reported by BLAST? correspond to TRYP:„HIS and 
TRYP_SER regions (Figure 3). However, similarity in the 
serine proteases extends from one end of the protein to the 
odier. with conservation throughout the sequence. Indeed, 
many parts of protein are conserved more strongly than the 
region around the aspartic acid in the catalytic triad (Figure 
3). Thus, while the residues in the catalytic triad are an 
essential feature for a functional serine protease, it is the 



serine protease fold (two domains containing anti-parallel 
barrels; Figure 5) that is required to bring diese 
residues together. The evolutionary pressure to conserve the 
tiypsin-like serine protease fold ensures that the folding 
domains share similar amino acids. 

The requirement for a common folded structure in 
homologous proteins usually causes similarities to extend 
from one end of the protein to the other. With the exception of 
mosaic proteins that are the result of recent exon shuffling 
(Doolitde, 1995). c^timal local sequence similarity is rarely 
confined only to a portion of two homologous sequences. (In 
mosaic proteins, the similarity extends throughout the exon- 
shuffled domain.) In general, it is incorrect to speak of 
homology at the N terminus or C tenninus, even though only 
a portion of the protein may be aligned in *High Scoring 
segment Pairs' by BLASTP. Indeed, the length of the locally 
similar region can sometimes be used to distinguish low- 
scoring related sequences from high-scoring unrelated 
sequences. Thus, all but two of the library sequences 
(including four with expectation values >0.02 ) that align 
over >80% of the length of the TRYP„BOVIN query 
sequence are members of the trypsin-like serine protease 
family. Figure 4 displays the locally similar regions for the 
related and unrelated sequences in Figure 2; the highest 
scoring unrelated sequences tend to have relatively short 
(<100 residue) regions of higher similarity (—30% identical), 
while related sequences have longer (140-300 residue) 
aligned regions, sometimes with lower (25% ) sequence 
identity. In general, alignments with longer, lower identity are 
more significant than those with shorter, higher identity. 

The requirement for similarity over a large region is more 
evident when three-dimensional structures are examined. 
TRYP_BOVIN (smicture not shown), TRYP_STRGR 



326 



Identifying distantly related protein sequoices 



LOCUS 

TRYP.BOVIN 

TRY2_PUMAN 

TRYP.PLBPL 

KLK2_flDMAN 

RWAJVIPRU 

TRYl^ANOGA 

TRYAu-DROME 

PA9„PAT 

PLMN_PIG 

TRY5,,AN0GA 

TRYP_FUSOX 

FA7^RABIT 

URTB_PBSRO 

ACRO_PIG 

PRTC_HOMAN 

TRYH-CANFA 

TRYP_STRGR 

HGF_HUMAN 

ACH1_L0NAC 

CERC_SCHMA 

C02_H0MAN 

CFAB_JiOUSE 

PRTZ_BOVIN 



Description 

trypsinogen <EC 3.4.21.4). 

trypsinogen XI 

trypsin 

glandular kallikrein 2 
vipera ruseelli proteinase 
trypsin 1 
trypsin alpha 
coagulation factor XX 
pi asmi nogen 
trypsin 5 
trypsin 

coagulation factor VII 
salivary plasminogen activator p 
aero 8 in 
protein C 

mastocytoma protease 
trypsin 

hepatocyte growth factor prec . 

achelase I protease 

cercarial protease 

complement C2 

complement factor B 

vitamin K-dependent protein Z 



len score E(51,780) 



I^Rl^MOUSE 
GSEP„BACLI 
KRUC_SHBEP 
PRI*A_IiYSEN 
AGI_URTDX 
KCR8_YEAST 
G156_PARPR 
YLK3_CAEEL 
AMY.CIiOAB 
AGI^KORVU 
YB9X-YBAST 
PRTS_MOUSE 
DLK.JIUMAN 
PRTB_STRGR 
PRTA_STRGR 

Fig. 2. Serine protease search — high-scoring 
1996) with TRYP_BOVIN. Only 10% of the 



loricrin. 

glutamyl endopeptidase 

keratin, ultra high- sulfur matrix 

alpha- lytic protease 

lectin/endociiitinase precursor 

prob. serine /threonine-protein kin 

156g surface protein precursor 

putative ser. /thr. -protein kinase 

jmtative alpha-amylase 

root -specific lectin precursor 

hypothetical trp-asp repeats 

vitamin k-dependent protein S 

delta-like protein 

streptogrisin B (S. gris. prot. A) 

streptogrisin A (S. gris. prot. A) 



o *> o 
229 




0 


247 




0 


250. 


1 Q Q 


0 


261 


e c c 


0 


236 


637 


10-32 
10-31 

10-30 
10 

1 n-30 

10-" 
■10-2! 

10-27 
10-26 

10-25 
10-25 
10-20 

10-" 

10-" 


274 


f A A 

600 


256 


CIO 


282 


573 


790 


569 


274 


550 


248 


541 


• 443 


^ 1 Q 


431 


3 no 


4l3 


DUX 


4 ol 


AO A 


269 


AQ.A 


o e o 
259 


Al ft 
4 X U 


72o 


ion 


213 




264 




_<i 


752 


X7 o 




7 61 


X / VI 


0- 00041 


^ n £ 

396 


1 AO 
X4^ 


0 015 


A 


X^3 


0 ■ 24 






0 .45 


182 


107 


1,3 • 


397 


107 


3.1 


372 


105 


3.9 


. 603 


107 


4.7 


715 


117 


5.0 


895 


114 


5.4 


469 


104 


5.7 


212 


98 


6.2 


878 


105 


9.5 


675 


103 


9.8 


383 


99 


9.9 


299 


94 


16. 


297 


85 


64. 



scmicnccs. High-scoring sequences from a search of SwissPiot (Baiioch and Bocchnmn 1991; release 33. April 
^tabase seqiinces with E() < 10^ are shown, Trypsin-Uke serine proteases with EQ > 0.02 are in italics. 



(Figure 5, Isbt) and PRTA_STRGR (Isgc) share a very 
similar all-jS fold with symmetrical 0 barrel structures and 
two short a helices. Very little of this structure is directly 
involved in forming the catalytic triad in the active site; yet 
the entire fold is conserved, thus requiring conservation of an 
amino acid sequence that adopts this fold. 

Although almost all vertebrate trypsin-like serine proteases 
share significant sequence similarity with bovine trypsin, 
most bacterial serine proteases do not. For example, the 
similarity score for alignment of bovine trypsin with S,griseus 
protease A is not statistically significant (E() < 64). even 
though the structiues of the two enzymes are very similar 
(Figure 5). Thus, while statistically significant similarity 
generally implies common ancestry, and thus common three- 
dimensional structure [the most conunon exceptions to this 
rule are regions with very low amino acid complexity, 
e.g.YSGGGGSSCGGGYSGGGGSSCGGGSSGGG from 
LORI_MOUSE (Altschul et al., 1^4)], lack of statistically 
significant similarity does not imply non-homology. 

Figure 5 also shows the structures of two non-homologous 



proteins. Subtilisin (Isbt) is included because it is an example 
of ^convergent' evolution (DooUttie, 1994); subtilisin uses tfie 
same triad of catalytic residues (Asp. His and Ser) to cleave 
peptide bonds, but shares no structural similarity beyond the 
geometry of tiie active site of the enzyme. Subtilisin and 
subtilisin-like serine proteases are not homologous to the 
trypsin-like serine proteases. As expected, the different 
structures share no statistically significant sequence siniilarity 
(15(X) random sequences from SwissProt would be expected 
to have a better similarity score than that obtained in the 
trypsin/subtilisin comparison). 

Likewise, high-scoring sequences that are not homologous 
to trypsinrlike serine proteases rarely share structural 
similarity to the famUy. despite their 'strong' similarity. 
Wheat germ agglutinin (7wga) is the most similar non-serine 
protease sequence in the NRL_3D database of sequences 
whose structures are known, yet it does not cotitain a single 0 
sheet. With the exception of membrane-spanning proteins, 
which frequently share hydrophobic regions with other 
unrelated membrane proteins, high sequence similarity— in 



327 



>>TRYP STRGR TRYPSIN PRECURSOR (EC 3.4.21.4) (SGT) i (259 aa) 

Smith- Waterman score: 410; 34.211% identity in .228 aa overlap 



10 • 20 ^30 40 

TRYP_BOVIN VDDDDK rTOGYTCGAmVPYQVSLNSGjraijCGGSLINSQ 

TRYP_STRGR SAAPNP VvGGTRAAQGEFPFMVmLSMG -— CGGALYAOD 



7.6 



50 

-SGIQVRLGED 



SGSGNNTSITAT 
80 



60 70 80 90 100 110 

TRYP^BOVIN N--INVVEGNEQFISASKSIVHPSYNSNTIJ©«IMLIKLKSA^^ 



— "•^■***«**i-"*^>J'»**iSlJA>| JXVVAOXOJJXrlOL.r 

TRYP.STRGR GGyTOI^SSS^ 

90 . ^00 110 T20 130 140 



'AY 
0 



120 



130 



140 



150 



TRYP_BOVIN AG'iwLISGwSOTKSEjGTSYPDV^ 



TRYP.STRGR NQCTFT^AGWGJ^ djpkip^ SEirAnvprn rJz 

■*-50 160 17^ TW^ 190 ^ra^ 



170 



IB 



TRYP_BOVIN KDSC 

• « * 

.TRYP_STRGR 7DTC 




-90 znn 210 220 

rCSGK LQ- GrvSWGSGCAQKNKPGVYTKVCNYVSWllKQTIASN 

^RKDNADEWIQ MGivSWGYGCARPGY^^^ ASAARTL 
220 230 ^50 



TOW tm^'^^^^ER^^'^!^^ of bovine t^psinogcn (TRYP_BOVIN) and S.griseu. trypsin (TRYP.STOGR). Shaded boxes indicate the 
^ri^H^S^xTg ^"^-^ that is the Chin, componem of the catalytic triad. Unshaded boxes indicate the 



the absence of homology— provides no information about 
structural similarity. 

Using statistical significance to explore distant 
relationships 

A major adVance in sequence identification by similarity 
searctiing has been the development of accurate statistical 
estinaates for similarity scores (Altschul et al, 1994). Since 
the similarity score from comparison of TRYPJOVIN and 
TRYP_STRGR has an expectation value of E() < 10~^° we 
conclude that these two sequences share similarity that would 
never be obtained by chance (or obtained once in 10^® 
searches of a database the size of SwissProt). and thus their 
similarity reflects a common ancestry for the two sequences. 
Current versions of the FASTA package of sequence 
comparison programs (version 2 and 3) include accurate 
statistical estimates for both FASTA and SSEARCH (Smith- 
Waterman) similarity scores (Pearson, 1996). Careful analy- 
sis of the high-scoring non-homologous sequences can be 
used both to confirm that the statistical estimates are reliable 
and to explore distantly related members of a protein family. 

Identifying the highest-scoring non-homologous sequences 
in a database search may seem difficult if the protein family 
is very diverse. However, additional searches with high- 
scoring, but possibly unrelated sequences can be used to 
separate high-scoring unrelated sequences from distanUy 
related sequences. Additional searches with high-scoring 



unrelated sequences will typically produce "matches* with 
unrelated sequences, while additional searches with distantly- 
related sequences will produce 'matches' to protein family 
members. If the statistical estimates are accurate, high- 
scoring uiuielated sequences will have E() values of --1.0 , 
since one highest scoring sequence is expected in every 
search. If the E() value for the highest scoring unrelated 
sequences are unexpectedly low and the sequences do not 
contain low-complexity simple sequence repeats, additional 
searches can be carried out with higher gap penalties. 

Bovine trypsin (TRYP„BOVIN) shares statistically sig- 
nificant similarity with every full-length manunalian serine 
protease, but the bacterial alpha-lytic protease (PSLA_LY- 
SEN) or S,gnseus protease A or protease B do not share 
significant similarity with bovine trypsin. There is no 
question that these proteins are homologous to the mamma- 
lian trypsin-like enzymes because of their strong structural 
similarity (Figure 5). However, in the absence of high- 
resolution structural data, how can one decide whether 
a high-scoring, but not significantly similar, sequence is 
homologous? 

Additional searches with the highest scoring, non-signifi- 
cant matches allow us to identify additional members of the 
family. A search with PRTZ_BOVIN. which has a marginally 
significant score, shows strong similarity (E() values < 10"'^ 
with a variety of other members of the family, thus 
confirming its homology. LORI_MOUSE gives a different 
result; while many serine proteases are highly ranked with 



328 



Identifying distantly related prot^ sequences 



LOCUS E ( ) 

TRYP_BOVIN 
TRY2_HUMAN 
TRYP_PLEPL 
KLK2_HUMAN 
RWA_VIPRU 
TRYl^ANOGA 
TRYA_DROME 
FA9_RAT 
PLMlSr_PIG 
TRY5_ANOGA 
TRYP_FUSOX 
FA7_RABIT 
URTB_DESRO 
ACRO_PIG 
PRTC„HUMAN 
TRYM_CANFA 
TRYP_STRGR 
HGF_HUMAN 
ACH1„L0NAC 
CERC_SCHMA 
C02_HUMAN 
CFAB^MOUSE 
PRTZ_BOVIN. 

liORI^MOUSE. 

gsepZbacli 
kruc_sheep 

PRL.A_LYSEN 
AGI_URTDI 
KCR8_YEAST 
G156_PARPR 
YLK3_CAEEL 
AMY_CLOAB 
AGI^HORVU 
YB9X_YEAST 
PRTS_MOUSE 
DLK_HUMAN 
PRTB„STRGR 
PRTA STRGR 



% ident . 



0 


100 .0 


0 


75.0 


0 


45.7 


0 • 


43 .5 




40.9 




39,9 




42.1 


— "i n 

10 


40 . 9 


-7 n 


40 . 8 


— 9 fl 

10 ^° 


38.7 


10 


.41.6 


3^Q-27 


37 .2 


3^Q-27 


38.2 


10 ° 


35.7 


t r^-26 

10 


34 . 5 


10 tr. 


37 , 5 


10 


34 . 2 


7X-18 
10 


31 . 6 




33 . 5 


io~2 


26 . 9 


10 ^ 


26 . 1 


10 


24 . 0 


0 .015 


25 . 2 


6 .24 


33 .7 


0 .45 


20 . 6 


1.3 


27 . 9 


3 .1 


21-5 


3.9 


26.1 


4.7 


33 .3 


5.0 


31.2 


5.4 


25.9 


5.7 


23 .3 


6.2 


24,8 


9.5 


32 .3 


9.8 


28.4 


9.9 


34 .2 


16. 


24 .0 


64. 


23 .4 



ng. 4. Serine prt>tease alignments THe alignments of each of the high-scoring sequences reported m Figu« 2 ^ ^f^'LT^ 
TOYP_BOVIN query sequence. TTius, alignment of TRYP.BOVIN with itself extends from the *«^»ns ^o die 

TRYpIbOVIN Mid TRYA.DRQME extends over 85% of the TRYP.BOVIN query sequence. Members of the family with E() > 0.02 are italicized. The 
value and percent identity are also shown. The ssearch -m 4 option was used to produce this figure. 



significant similarity, the sequence alignments contain a 
repeated glycine and serine motif. Thus, LORI_MOUSE is 
not homologous; it contains an unusual simple amino acid 
repeat sequence. On the other hand, GSEP_BACLI shares 
strong similarity with several bacterial serine proteases (E() < 
10"^) and weaker, but significant similarity with TRYP_SA- 
CER and TRYP_FUSOX, Streptdmyces and yeast trypsins 
with very strong similarity to bovine trypsin. GSEP_B ACLI 
is, therefore, a member of the trypsinrlike serine protease 
family. 

A search with alpha-lytic protease reveals a second group 
of closely related serine proteases, which includes S.griseus 
protease A and protease B, While none of the sequences in 



Figure 2 have significant similarity with PRLA_LYSEN. 
GLUP_STRGR, an S.griseus glutaniyl endopeptidase, shares 
strong similarity with the S.griseus protease A and B, alpha- 
lytic protease, and weaker, but significant similarity with 
TRYA_DROME and several other Drosophila serine pro- 
teases (Figure 6). The insect sequences share strong similarity 
to mammalian trypsin-like serine proteases (Figure 2). Thus, 
by carefully exploring sequences with high, but not 
statistically significant, similarity scores, it is possible to 
construct statistically significant links between very distantly 
related serine proteases. 

Distant sequence relationships can thus be established by 
moving from sequence A to significanUy similar sequence B. 



329 



WtRaPearsan 





S. griseus trypsin (Isgt) 
E0<1(r20 34% 228/259 



S. grfseus protease A (1 sgc) 
E()<64 25% 199/297 





Subtilisin (Isbt) 
E0<1500 25% 1132/275 



Aoglutfnin (Twga) 
E0<57 24% 104/171 



Fig. 5. StnxctuTBS — homologous, convei^nt and unrelated. The stnictuies of 
two niembers (Isgi, Isgc) of the trypsin-like serine protease family arc 
shown, along with subtilisin (Isbt) — a noh-tiypsin-iike serine protease — and 
wheat gerai agglutinin (7wga). one of the highest scoring non-scrinc 
proteases in the NRL_3D database (release 20) of seqiieoccs whose structures 
are known. Serine protease structures are aligned to present a similar view of 
the cataljOic site. The expectation values shown are based on a comparison of 
bovine trypsin (TRyP_BOVIN) to the SwissProt (release 33) protein 
sequence database. Also shown are. the percent identity and the length of 
the similar region with respect to the length of the sequence of the structure 
shown. 



and then from B to C, even though A does not share 
significant similarity with C. The strategy is effective because 
of the implicit evolutionary tree that connects all the members 
of a protein family. Thus, in Figure 7, a sequence on a 
relatively short branch, TRYA_DROME, can be used to 
establish significant relationships with very, diverse members 
of the family. For large and diverse protein families, it is 
usually easy to identify a number of *less-divergent' family 
members that can be used to link distant branches of the tree. 
Naturally, such inferences are more reliable if statistically 
significant similarity scores are produced with different sets 
of scoring matrices and gap penalties, and if they are 
established with several different linking sequences. 

A phylogenetic tree was produced from selected vertebrate, 
invertebrate and prokaryotic trypsin-like serine proteases. 
Sequences were aligned using ClustalW (Thompson et al, 
1994) and protein distances estimated and distance trees built 
using the PHYLIP package (Felsenstein. 1989). The three 
numbers to the right , of the sequence names report the 
statistical significance of the alignment score between the 
sequence and bovine trypsin (TRYP_BOVIN), Drosophila 
trypsin A (TRYA^DROME) and ^.gnseus glutamyl 
endopeptidase (GLUP_STRGR), respectively. MPR_BACSU 
is an example of another sequence that links eukaryotic and 
prokaryotic smne proteases, although it does not share ^ 
statistically significant similarity with the three query 
sequences used for expectation values here. 

Summary 

Protein sequence comparison is the most powerful tool 
available today for inferring structure and function from 
sequences because of the constraints of protein evolution — a 



liOCUS 



Description 



len score E (51, 934) 



GLUP_STRGR 

SFA1_STRFR 

PRTA^STRGR 

PRTB_STRGR 

SFA2_STRFR 

PRUV-LYSEN 

SP1.JIARPA 

TRYAJROME 

U>RI_HUMAN 

L0RI_>10USE 

TRYBJ>ROME 

AIDA_ECOLI 

TRYD_J>ROME 

GSEP_BACLI 

TRYG_DROME 

TRYP_FUSOX 

APMU_PIG 

SLAP_CAUCR 

TRY4_LUCCU 



glutamyl endopeptidase II 188 1223 

serine protease 1 357 1019 

streptogrisin A 297 681 

streptogrisin B 299 624 

serine protease 2 174 583 

alpha-lytic protease 397 349 

serine protease I 525 297 

trypsin alpha 256 160 

loricrin, 316 157 

loricrin. 481 160 

trypsin beta 253 152 

adhesin AIDA-I 1286 155 

trypsin delta 253 13 9 

glutamyl endopeptidase 316 140 

trypsin gamma 253 13 8 

trypsin ^ 248 135 

apomucin mucin core protein 1150 144 

S-layer paracrys . surf, prot 1025 142 

trypsin alpha-4 255 130 



0 
0 

10-30 
10-28 

10-14 

10-10 
0.0031 



0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



0057 

0058 

009 

032 

054 

059 

061 

091 

13. 

15 

19 



Fig. 6. From glutamyl endopeptidase to TRYA_DRONfE. 



Identifying distantly related protein sequences 



r- TRYP_BOVIN 



KLK2_HUMAN 
FA7_RABnr 



ACH1_L0NAC 



0.0/10-30/ 
0.0/10-22/ 
10-27/10-17/ 
10-16/10-21/ 



TRYA_DROME 



TRYP.STRGR 10-20/10-20/ 

10-31/0.0/0.003 



MPR BACSU 44/0.19/2.7 



PRLA_LYSEN 



GLUP.STRGR 



3.1 / 99/^Q•^^ 
87/10-3/0.0 



16/0.5/10-30 



PRTB_STRGR 

PRTA^STRGR 64/0.04/0.0 
Fig. 7. Similarity and homology — a serine protease family tree. 



protein must fold into a functional structure — which are 
reflected in its sequence. Protein sequence similarity can 
routinely be used to infer relationships between proteins that 
last shared a common ancestor 1-2.5 billion years ago. Our 
ability to identify distantly related proteins has improved over 
the past 5 years with the use of optimized scoring parameters 
(Pearson, 1995) and the development of accurate statistical 
estimates. In using sequence similarity to infer homology, 
one should remember the following. 

1. Always compare protein sequences if the genes encode 
proteins. Protein sequence comparison will typically double 
the look-back time over DNA sequence comparison, 

2. Homologous sequences are usually similar over an entire 
sequence or domain. Matches that are > 50% identical in a 
20—40 amino acid region frequently occur by chance. 

3. While most sequences that share statistically significant 
similarity (E() < 0.02 ) are homologous, many distantly 
related homologous sequences do not share significant 
homology. (Significant similarity in low-complexity regions 
does not imply homology.) 

4. By focusing on the statistical significance of a similarity 
and identifying the highest scoring unrelated sequence in a 
database search, you can both confirm that the statistical 
estimates are accurate and potentially identify distantly 
related family members. 

5. Homologous sequences share a conunon ancestor, and thus 
a common protein structure. Depending on the evolutionary 
distance and divergence path, two or more homologous 
sequences may have very few absolutely conserved residues. 
However, if homology has been inferred between A and B, 
between B and C, and between C and D, A and D must be 
homologous, even if they share no significant similarity when 



compared directly. In evaluating the results of a similarity 
search, remember that there is an evolutionary tree that 
connects the family members. 

Motifs revisited 

This review argues that sequence similarity searching, rather 
than motif identification, is the most reliable method for 
identifying distantly related protein sequences. However, 
motif searches are frequentiy used to characterize a newly 
determined sequence. While motifs can be very valuable for 
identifying functional sites in a protein, one must be very 
careful in basing sequence identifications on motif patterns 
alone. Thus, if a newly determined protein sequence contains 
the G-D-S-G-G motif, but does not share strong similarity 
(E() < 20) with any of the hundreds of trypsin-like serine 
proteases in the protein databases, is it likely to be 
homologous to trypsin and share the same proteiii fold? It 
seems unlikely, since so many very distantly related members 
of the fanuly are known. However, if a protein sequence 
shares high, but not significant (0.02 < E() < 20) sequence 
similarity with several distantly related members of the 
family, the presence of the two motifs in Figure 1 would 
provide strong supporting evidence that a new branch in the 
serine protease family had been found. 

Alternatively, if a sequence shares significant similarity 
with proteins from several branches of the serine protease 
family tree, but does not contain the G-D-S-G-G motif, it is 
very likely that it adopts the serine protease protein fold, 
although it may not function as a protease. Thus, when 
enzymatic mechanisms are known, motifs can be used to 
confirm functional aspects of homologous proteins. However, 
in the absence of strong similarity to any member of a large 
protein family, motifs are unreliable for inferring protein 
homology. 

References 

Altschul.S.F., Boguski>1.S., Gish.W. and Wootton J.C. (1994) Issues in 

searching molecular sequence databases. Nature Genet,, 6, 1 19-129. 
Bairoch.A. (1991) PROSITE: a dictionary of sites and patterns in proteins. 

Nucleic Acids Res,, 19(SappK). 2241-2245. 
Baiioch^. and Boechmann,B. (1991) The SWlSS-iPROT protein sequence 

data bank. Nucleic Acids Res., 19(SnppL). 2247-2249. 
Bult,CJ. et at. (1996) Complete genome sequence of the methanogenic 

archaeon, Methanococcus jtmnaschii. Science, 273, 1(J58-1073. 
Doolittlc,R.F. (1994) Convergent evolution: the need to be explicit Trends 

Biochem. Sci, . 19, 1 5- 1 8. 
DoolittletR.F. (1995) The multiplicity of domains in proteins. Annu, Rev, 

Biochem., 64, 287-314. 
Dujon3. (1996) The Yeast Genome Project, what did we leam? Trends 

Genet,, 12. 263-270. 
FelsensteinJ. (1989) PHYUP — Phylogeny Inference Package (Version 3.2). 

Ctadistics, 5, 164-166. 
F1eischmann,R.D. et al (1995) Whole-genome random sequencing and 

assembly of Haemophilus influenzae Rd. Science, 269. 496-5 12. 
Frascr,CJ4. et oL (1995) The minimal gene complement of Mycoplasma 

genitalium. Science, 270. 397-403. 
Pcareon.W.R. (1995) Coii^)arison of methods for searching protein sequence 

databases. Protein Sci„ 4. 1 145-1 160. 



331 



WJLPearson 



Ptaison.W.R. (1996) Effective protein sequence comparison. Methods 

EnzymoL, 266, 721-25%. 
Smith.T.F. and WatcnnanAI.S. (1981) Identification of common molecular 

subsequences. / MoL Biol, 147. 195-197, 
TTjonqwonJ.p.. HigginsJJ.a and Gibson.TJ. (1994) ClustalW: Improving 

the sensitivity of progressive multiple alignment through sequence 

weighting, position-specific gap penalties and weight matrix choice 

Nucleic Acids Res. , 22, 4673-4680. 



Received on November 21, J996: revised on January 10. 1997; accepted on 
January 28, 1997 



Exhibit 3 0 



Protein Science {\99S), 4:337-360. Cambridge University Press. Printed in the USA. 
Copyright © 1995 The Protein Society 



REVIEW 

Structural basis of substrate specificity 
in the serine proteases 



JOHN J. PERONA' and CHARLES S. CRAIK 

Departments of Pharmaceutical Chemistry and Biochemistry & Biophysics, 
University of California, San Francisco, California 94143-0446 

(Received July 25, 1994; Accepted December 28, 1994) 



Abstract 

Structure- based mutational analysis of serine protease specificity has produced a large database of information 
useful in addressing biological function and in establishing a basis for targeted design efforts. Critical issues ex- 
amined include the function of water molecules in providing strength and sjjecificity of binding, the extent to which 
binding subsites are interdependent, and the roles of polypeptide chain flexibility and distal structural elements 
in contributing to specificity profiles. The studies also provide a foundation for exploring why specificity modifi- 
cation can be either straightforward or complex, depending on the particular system. 

Keywords: enzyme kinetics; macromolecular recognition; protein engineering; protein-ligand interactions; pro- 
tein structure; serine protease; site-directed mutagenesis; substrate specificity 



Serine proteases were among the first enzymes to be studied ex- 
tensively (Neurath, 1985). Interest in this family has been main- 
tained in part by an increasing recognition of their involvement 
in a host of physiological processes. In addition to the biologi- 
cal role played by digestive enzymes such as trypsin, serine pro- 
leases also function broadly as regulators through the proteolytic 
activation of precursor proteins (Neurath. 1984; Van de Ven 



Reprint requests: Charles S. Craik, Departments of Pharmaceutical 
Chemistry and Biochemistry & Biophysics, University of California. San 
Francisco, California 94143-0446; e-mail; craik@cgl.ucsf.edu. 

' Present address: Department of Chemistry and Interdepartmental 
Program in Biochemistry and Molecular Biology, University of Cali- 
fornia, Santa Barbara, California 93106. 

Abbreviations: APPI, amyloid 0-protcin precursor inhibitor domain; 
BAP, Baciiius aica/ophi/us alkaklm^ protease; BLAP, Bacii/us ientus a\- 
kaline protease; BPTI, bovine pancreatic trypsin inhibitor; CMK, chlo- 
romethyl ketone; HNE, human neutrophil elastase; hGH. human growth 
hormone; Nva, norvaline, a linear three-carbon side chain; PAI-I, plas- 
minogen activator inhibitor I; pNA, /7flrra-nitroanilidc; PPE, porcine 
pancreatic elastase; PROK. Thermus album proteinase K; RMCPl and 
RMCPII, rat mast cell proteases 1 and 11; SBPN, Bacilius amyhliqu^a- 
ciens subtilisin BPN'; SCARL. Bad f /us licheniformis subtilisin Carls- 
bcrg; SGPA. Streptomyces griseus protease A; SGPB. S. griseus protease 
B; SGPE; S. griseus proieasc E; SSI, Streptomyces subtilmn inhibitor; 
sue, succinyl; 5uc-FAHY-pNA, tclrapcptide amide substrates varying 
at the PI position; «/c-XAPF-pNA, telrapeptide amide substrates vary- 
ing at the P4 position; THERM, Thermus vulgaris ihermitase; TPA, tis- 
sue plasminogen activator. Nomenclature for the substrate amino acid 

residues is Pw P2, PI, PI'. P2'. . . . , Pn'. where PI -Pi' denotes 

the hydrolyzed bond. Sn, . . . , S2, SI, SI', S2', . . . , Sn' denote the cor- 
responding enzyme binding sites. 



et al.. 1993). Examples of this regulation include the processing 
of trypsinogen by enteropeptidase to produce active trypsin 
(Huber & Bode. 1978) and the cascades of zymogen activation 
that control blood clotting (Davie et al.. 1991). Serine proteases 
have also been recently shown to play essential roles in cell dif- 
ferentiation. For example, the DrosophUa trypsin-like enzymes 
Easter and Snake are important components in the specification 
of ventral and lateral patterns during development (Chasan & 
Anderson. 1989), Asymmetry of cell fates may be the result of 
a protease cascade involving both of these enzymes (Smith & 
DeLotto, 1994). 

An alternative rationale for the continued interest in serine 
proteases has been their emergence as one of the major para- 
digms for the understanding of enzymic rate enhancements and 
of structure-activity relationships. Until recently, all of the 
known enzymes fell into one of two distinct structural classes: 
the chymotrypsin-like and subtilisin-like families (Matthews, 
1977; Fig. lA.B). However, the crystal structure of wheat ser- 
ine carboxypeptidase II (Liao & Remington, 15W; Liao et al., 
1992; Fig. IC) reveals conservation of the essential features of 
the catalytic apparatus within a third distinct protein fold. This 
homodimeric enzyme possesses the a+/3 fold found also in a 
number of other enzymes that share hydrolyiic activity as their 
only common feature (Ollis et al,, 1992). The fold consists of 
an 11-stranded mixed |8-sheet structure surrounded by 15 heli- 
ces, with the active site located at the base of a deep bowl-shaped 
depression in the enzyme surface (Fig. IC). 

The three serine protease classes are distinguished by the ab- 
sence of any conserved secondary and tertiary motifs, but in 



337 



33S 




Kig. I, Diversity of struciural niotifj; in which ihc common catalytic ap- 
piiraius of serine protease is embedded. Shown arc ribbon drawings of 
chyrnoirypsin (A), subiilisin BPN' (B), and wheat serine carboxypcpii- 
dase (C). a-Hclices are shown as red cyhnders and fi^-st rands as yellow 
arrows. Secondary siruciures were determined by the algorithm of 
Kabsch and Sander (1983). Each enzyme possesses two common resi- 
dues of crucial importance lo catalysis: a nucleophilic Ser and an adja- 
cent His, which functions as a general base (shown in white). Enzymes 
are oriented identically by superposition of the backbone atoms and C{S 
of these two amino acids. A third member of the catalytic machinery 
is an aspartate residue (shown at left, also in white) not conserved in po- 
sition relative to ihc Ser and His (compare serine carboxypeptidasc with 
the other two enzymes). Lack of con.servation in position of this resi- 
due suggests that the catalytic apparatus may be better viewed as a jux- 
taposition of Ser-Hlsand His-.Asp dyads, rather than as a single catalytic 
iriad- 



J. J. Perona and C. S. Craik 

each case, ihc catalytic serine and histidinc residues inainiain an 
identical geometric orientation (Fig. I). To u lesser. cxieni, ad- 
jacent groups that stabilize the transition state arc also similarly 
arranged (Wright et al.. !969: Roberius ei a!.. 1972a, 1972b; 
Liao et al., 1992). Thus, it appears that nature has arrived at the 
same biochemical mechanism by separate avenues: the chymo- 
trypsin, subtilisin» and serine carboxypeptidasc families of serine 
proteases are a classic example of convergent enzyme evolution 
(Matthews, 1977; Liaoetal., 1992). The resemblance of serine 
carboxypeptidasc to other members of the a//?- hydrolase fold 
family also indicates the operation of divergent evolution within 
this structural framework (Ollis et al.. !992>. Further, a recently 
generated catalytic antibody has been characterized that cata- 
lyzes the stereoselective hydrolysis of norleucine and methionine 
phenyl esters (Guo el al., 1994). The crystal structure of this en- 
zyme reveals the presence of a Ser- His catalytic dyad structur- 
ally similar to ihose of the other serine protease classes (Zhou 
ct al.. 1994). A similar catalytic mechanism is therefore sug- 
gested, indicating ihai the antibody fold may well be a fourth 
struciural framework capable of supporting proteolytic activ- 
ity in a serine protease-like fashion. 

We consider here the structural and kinetic basis for the di- 
versity of substrate specificity in the subtilisin and chymoirvpsin- 
class serine proteases. Emphasis is placed on those systems for 
which both crystallographic and detailed kinetic measurements 
are available, .^fter a brief review of the common mechanism 
of the three classes and the role of mutational analysis in its fur- 
ther elucidation, we concentrate much of our aiieniion on the 
three enzymes subtilisin BPN', «-lytic protease, and trypsin. In 
each case» an cvtensive structure-function analysis has been ap- 
plied to address the roles of particular amino acids in contributing 
to the observed specificity profiles. The wealth of information 
available on the chemical and kinetic mechanisms of catalysis 
and the large data base of homologous sequences provide an es- 
sential fratiicwork that supports these studies. Although the 
functional and/or structural properties of many of the mutant 
proteases can be given a relatively straightforward and objec- 
tive description, there are also many examples where the data 
cannot be easily encapsulated. In these cases, some subjectiv- 
ity in the description of kinetic and structural parameters is un- 
avoidable, and other interpretations of the same data could yield 
different overall conclusions. 

The cutu lytic inechunisni 

The vast majority of early studies on the .serine proteases focused 
on The elucidation of the chemical and kinetic mechanisms of 
catalysis (reviewed by Bender & Killheffer. 1973; Blow, 1976; 
Kraut , 1977; Polgar. 1989). Hydrolysis of ester and amide bonds 
proceeds by an identical acyl transfer mechanism in al! enzymes 
of the subtilisin and trypsin families (Fig. 2A,B.C). Michaelis 
complex formal ion is followed by attack on ihe carbonyl car- 
bon atom of the scissile bond by the eponymous serine of the 
catalytic triad, which is enhanced in nucleophilicity by the pres- 
ence of an adjacent hisiidine functioning as a general base cat- 
alyst. Proton donation by the histidinc to the newly formed 
alcohol or amine group then results in dissociation of the first 
product and concomitant formation of a covalent acyl-enzyme 
complex. The deacylation reaction occurs via the same mecha- 
nistic steps, with ilie attacking nucleophilc provided by a water 
molecule that approaches from the just-vacated leaving group 



Substrate specificity in serine proteases 



339 




Acvlatlon Ratg-liinltlng nuacylatlon Rate-tlmltino 

•'cat- S '^M - ^^x- '^s[*r~V'] 

Fig. 2. Chemical and kinetic mechanisms of catalysis for serine prote- 
ases. The catalytic groups of trypsin (A) and subiilisin (B) are shown 
interacting with an oligopeptide substrate binding to the Pt-P4 sites. 
(Nomenclature for the substrate amino acid residues is Pn, . . . , P2, P I , 
pi; P2', .... Prt', where PI -PI' denotes the hydrolyzed bond. Sn. - . . , 
S2, Si , Sr, S2', . . . , S/f' denote the corresponding enzyme binding sites 
[Schechier & Berger. 1968].) Note the distinction in residues that form 
the oxyanion hole; in subiilisin » part of the interaction is made by an 
enzyme side chain. The binding site for the oligopeptide also differs; in 
subtilisin it forms the central strand of a three-stranded antiparallel i3- 
sheet. The SI site of trypsin and the SI and S4 sites of subtilisin are the 
major sites where mutagenesis has been used to probe specificity. C: 
Common kinetic mechanism of catalysis for serine proteases indicating 
the meaning of the mechanistic rate constants and their relationship to 
the Michaelis parameters. The correct interpretation of k„f and dif- 
fers depending on the raic-Iimiting step in catalysis, which varies among 
the different enzymes as well as among differing substrates of the same 
enzyme. 



side. Each step proceeds through a tetrahedral intermediate, 
which resembles in structure the high-energy transition state for 
both reactions. This mechanism is capable of accelerating the 
rate of peptide bond hydrolysis by a factor of more than 10' 
relative to the uncaialyzed reaction (Kahne & Still, 1988). 

Extensive structural evidence obtained from X~ray crystallo- 
graphtc and NMR investigations has provided conclusive corrob- 
oration of the essential features of this mechanism (reviewed by 
Sleitz & Shulman» 1982). The investigations have been favored 
by the availability of good ground-state and transition-state sub- 
strate analogs, which have been used to obtain high-resolution 
images of these interactions. The scissile bond of the substrate 
is bound directly adjacent to the Ser-His catalytic couple in all 
the complexes studied. A strong hydrogen bond between these 
two amino acids, necessary to subsequent proton transfer, is 
formed only after substrate is bound. A binding site for the oxy- 
anion of the intermediate is formed by the Gly 193 and Ser 195 
backbone amide nitrogens in the chymotrypsin-like enzymes 
(Fig. 2A), by one amide nitrogen and the Asn 155 side chain in 
the subtilisin family (Fig. 2B), and by the backbone amides of 
Tyr 147 and Gly 53 in the serine carboxypeptidases (Liao et al., 
1992). The interactions made in the S1-S4 enzyme sites (see 
Fig. 2 legend for substrate nomenclature) by the Pl -P4 positions 
of substrate form an antiparallel j3-sheet hydrogen bonding ar- 
rangement in the chymoirypsin and subtilisin families. Because 
the active site of wheat serine carboxy peptidase II does not pos- 
sess similarly exposed peptide backbone groups, it seems likely 
that substrate binding N-terminal to the scissile bond will oc- 
cur in a different fashion in this family (Liao et al., 1992). An- 
other unique structural feature of carboxypeptidase is an 
extensive hydrogen bonding network, which interacts with the 
C-terminal carboxylate of the substrate, essential to its activity 
as an exopeptidase (Mortenson et al., 1994). 

Mutational analysis of both subtilisin and trypsin has con- 
firmed the essential roles of Ser 195 and His 57 in providing rate 
acceleration- Replacement of the catalytic Ser 221 and His 64 
residues of subtilisin with alanine results in decreases of lO'*- 
IO*^-fold in k^„, (Carter & Wells, 1987, 1988). A decrease of 10*- 
fold when the two residues are simultaneously replaced with 
alanine showed that the two catalytic moieties function in a 
highly cooperative manner: mutation of either component re- 
duces activity to a baseline level. Similar results were obtained 
by analogous mutations of Ser 195 and His 57 in rat trypsin 
(Corey & Craik, 1992). This study also showed that enzyme vari- 
ants such as H57K and H57E, which might provide an alterna- 
tive general base, were ineffective, further underscoring the 
importance of the native catalytic triad geometry. These exper- 
iments, as well as others involving replacement of Ser 195 with 
a Cys (Higaki et al., 1989; McGrath et al.. 1989) and engineer- 
ing a metal-actuated activity switch involving His 57 (Higaki 
etal., 1990; McGrath et al., 1993), clarify the role of these active- 
site moieties. The mutational data are in agreement with early 
chemical modification experiments, which also indicated that 
Ser 195 and His 57 play crucial roles in catalysis (Dixon et al., 
1956; Shaw et al.. 1965). 

The residual activity remaining in subtilisin after removal of 
the catalytic moieties was attributed to remaining binding de- 
terminants that stabilized the transition slate complex. One such 
determinant is provided by a hydrogen bonding interaction of 
Asn 155 with the oxyanion intermediate. Mutation of Asn 155 
to a variety of other amino acids resulted in lO^-lO^-fold de- 



340 



J.J. Perona and C.S, Craik 



creases in kcat^^m (Bryan et al.. 1986; Wells ei ah. 1986; Carter 
& Wells. 1990). This provides support for the proposals made 
on the basis of crystallographic studies, which suggested that a 
weak hydrogen bond to Asn 155 in the Michaehs complex is 
strengthened in the transition state (Robertus et al., 1972b; 
Poulos et al., 1976). Interestingly, mutation of Thr 220 of sub- 
tilisin showed that it stabilizes the transition state by 2 kcal/mol 
despite the fact that the side-chain O"^ lies 4.0 A from the oxy- 
anion, too far for a direct interaction (Braxton & Wells, 1991). 
One explanation for the influence of Thr 220 was proposed to 
be that dynamic fluauations of the protein structure (Rao et al., 
1987) cause transient direct interactions to occur. An alterna- 
tive suggestion was that the oriented Thr 220 side-chain dipole 
may stabilize the transition state at a distance, by influencing 
the electrostatic potential in the active site. Significant pertur- 
bation of the pKg of the catalytic His 64 results from mutation 
of charged surface residues some 12-20 A distant from the ac- 
tive site (Russell et a!., 1987; Loewenthal el al., 1993). Similar 
mutation of distant charged residues affects the stability of com- 
plex formation with a transition-state analog inhibitor (Jackson 
& Fersht, 1993). These observations support the hypothesis that 
long-range electrostatic interactions may play a small but sig- 
nificant role in stabilizing the catalytic transition state. 

Considerable controversy has surrounded the role of an ad- 
ditional component of the catalytic apparatus, a conserved bur- 
ied aspartate residue first described in the crystal structure of 
chymotrypsin (Matthews et al., 1967; Blow et al., 1969). Mu- 
tation of this residue confirmed its essential role, because all 
variants of trypsin and subtilisin in which the aspartate is ab- 
sent are decreased in catalytic efficiency by at least a factor of 
10" (Craik et al., 1987; Sprang et al., 1987; Carter & Wells, 
1988; Corey & Craik, 1992). The early suggestion of a two- 
proton transfer model, in which the Asp accepts a proton to be- 
come uncharged in the transition state, now appears to be 
unsupported by the bulk of the experimental (Bachovchin & 
Roberts, 1978; Markley, 1979; KossiakoffA Spencer, 1981) as 
well as theoretical (Warshel et al., 1989) evidence. One role for 
the conserved Asp appears to be ground-state stabilization of 
the required tautomer and rotamer of the catalytic His (Craik 
et al., 1987; Sprang et ah, 1987). In addition, because the His 
imidazole ring acquires a proton in the transition state, the Asp 
carboxylate can provide compensation for the developing pos- 
itive charge. Its role may therefore be considered similar to that 
of the hydrogen bond donor groups in the oxyanion hole, which 
compensate the developing negative charge on the substrate car- 
boxy I oxygen atom (Warshel et al., 1989; Fig. 2A.B). Experi- 
mental evidence for the role of electrostatic stabilization of the 
trypsin transition state has been obtained by mutation of the 
conserved Ser 214, which forms a solvent-inaccessible hydrogen 
bond to Asp 102, to various charged and uncharged amino acids 
(McGrath el al., 1992). Decreases in the free energies of catal- 
ysis were in agreement with electrostatic calculations, based on 
crystal structures of the mutants, which predicted these losses 
of activity. 

Comparative analysis of the structures of chymotrypsin, sub- 
tilisin, and serine carboxypeptidase shows that the precise geo- 
metric orientation of the Asp is not conserved relative to the 
Ser-His catalytic diad (Liaoet al., 1992; compare Fig. 1A,B,C). 
In contrast to chymotrypsin and subtilisin, the plane of the Asp 
carboxylate in carboxypeptidase is tilted far out of the plane of 
the His imidazole, such that the His-Asp hydrogen bond is 45° 



out of the carboxylate plane. This geometry is unfavorable for 
proton transfer from His to Asp and provides further evidence 
against the double proton-transfer mechanism. A detailed anal- 
ysis of high-resolution subtilisin structures also showed differ- 
ences in the Asp-His hydrogen bonding relative to trypsin 
(McPhalen & James, 1988). It now appears that the Asp can oc- 
cupy virtually any position relative to the Ser-His diad. There- 
fore, it may be more accurate to regard the operation of the 
serine protease catalytic machinery as two diads — Ser-His and 
His-Asp — that operate in concen, rather than as a single cata- 
lytic triad (Liao et al., 1S>92). In this context, it is of interest to 
note that relocation of the Asp 102 carboxylate group to posi- 
tion 214 in trypsin significantly reconstitutes the activity lost in 
the variants D102S and DI02N (Corey et al.. 1992). The crys- 
tal structure of this mutant shows that Asp 214 still interaas with 
His 57, but in an altered geometric orientation in which the plane 
of the carboxylate is displaced from that of the imidazole ring 
by 40° The relatively high catalytic efficiency of this variant thus 
supports the view of the catalytic apparatus as a juxtaposition 
of two diads. 

Substrate specificity in the subtilisin family 

The catalytic machinery and substrate binding clefts of the 
subtilisin-class serine proteases are embedded in a single-domain 
molecule (Wright et al., 1969; McPhalen & James, 1988). Six 
crystal structures are available in this family: BaciUus amytoli- 
quefacie ns subXiWHn BPN' (Novo) (Wright et al., 1969; McPha- 
len & James, 1988), BaciUus Ucheniformis subtilisin Carlsberg 
(Bode et al., 1986a; McPhalen & James, 1988), Thermus vul- 
garis thermitase (Gros et al., 1989), Thermus album proteinase K 
(Betzel et al., 1988), Bacillus lentus alkaline protease (Betzel 
et al., 1992). and Bacillus alcalophilus alkaline protease (van der 
Laan et al., 1992). Tlie central core of the globular heart-shaped 
molecule is formed by a seven-stranded parallel 0-sheet (Fig. 1 B). 
Nine a-helices are packed against the sheet in a mostly antipar- 
allel fashion relative to the jS-strands; seven of these are on the 
same face and form the larger of two subdomains defined on ei- 
ther side (McPhalen & James, 1988). A two-stranded antiparal- 
lel j3-sheet is also formed in the larger subdomain near the 
C-terminus of the chain. The active site is located in the larger sub- 
domain adjacent to the central j3-sheet; the catalytic Ser 221 is 
found near the amino-terminus of a long of-helix, which follows 
the small antiparallel sheet (Fig. IB; McPhalen & James, 1988; 
numbering system for SBPN is used throughout). 

Nearly all of the secondary structure elements of the enzymes 
are very highly conserved. A central core of 194 amino acids has 
been defined by comparison of the known structures, which con- 
tains nearly all of the conserved a-helices and /3-strands (Siezen 
et al.. 1991). The fungal-derived PROK deviates most signifi- 
cantly in structure but still suF>erimposes these equivalent Ca at- 
oms with RMS deviation of about 0.9 A (the other prokaryotic 
enzymes superimpose at 0.4 A to 0.65 A; Siezen et al., 1991). 
If PROK is omitted, a more extended core of 232 amino acids 
can be defined among the bacterial species of known structure. 
An extensive sequence comparison of 47 subtilisin-class enzymes 
showed a subdivision into two subclasses, based on conserved 
differences in certain parts of the alignment. SBPN, SCARL. 
THERM, BAP. and BLAP are members of subclass I; the struc- 
turally divergent PROK is a representative of subclass II (Siezen 
et al.. 1991). Although the homologous catalytic core of some 



Substraie specificity in serine proteases 



341 




270 amino acids is found in all subiilisins, some of the enzymes 
possess large insenions in this domain, and many also possess 
C-iemiinai extensions resulting in polypeptide chains as long as 
1 ,775 amino acids. This large database of sequence information 
forms the basis for homology modeling of those enzymes for 
which no tertiar>' structure is available (Siezen et al.. 1991 , 1993). 

Crystal structures of enzyme-inhibitor complexes have iden- 
tified substrate binding determinants extending over nine amino 
acids, from P6 to P3'. The structures include several peptide 
chloromethyl ketone complexes, in which subsires P1-P3 are oc- 
cupied (Robertus ct al., 1972a; Pouios et a!., 1976), as well as 
complexes of SCARL with the protein inhibitor eglin C (Bode 
ei al.. 19S6a; McPhalen & James, 1988), SBPN with eglin C, 
chymotrypsin inhibitor 2 and S/re/;/o//;>'ce5 subtilisin inhibitor, 
(Bode eta!., 1986a; McPhalen & James, !988; Takeuchi el al., 
1991a, 1991b). THERM complexed to eglin C (Gros et al., 
1989), and PROK complexed with peptide inhibitors (Betzel 
et al., 1993). In each of these complexes, the inhibitor chain 
binds in a surface channel of the enzyme, which accommodates 
six residues from P4 to P2' On the N-terminal side of the scis- 
siie bond, the PI-P4 residues of the substrate main chain arc 
invariably inserted between two ^-strands of the enzyme at po- 
sitions 125-127 and 1(X)-102 (Fig. 2B). The substrate thus forms 
ll\e central strand of a i hree-strandcd antiparallel sheet unique 
to the subiilisi ns; in the chymotrypsin-like proteases, this struc- 
ture is not formed because only the strand corresponding to res- 
idues 125-127 is present (Tig. 2A). 

Subtilisins in general show broad substrate specificity profiles 
and often display a preference for large hydrophobic groups at 
position PI (Markland & Smith, 1971). At this position speci- 
ficity arises from a broad open SI binding cleft formed on one 
side by the two ^-strands, which interact with the P I -P4 sub- 
strate residues, and on the other by a loop comprising residues 
155-166 (Fig. 3). This loop varies in size among members of the 
family (Siezen et al., 1991). In SBPN, two different modes of 
binding exist to accommodate either Pl-Phe or Pl-Lys sub- 
strates (Robertus et al., 1972a; Pouios ct a!., 1976). The Phe ring 
binds deeply in the SI cleft near Gly 166, whereas the charged 
Lys extends across the cleft to form a salt bridge with Glu 156. 
A prominent hydrophobic cavity is also present for binding of 
the P4 substrate side chain (Fig. 3). These two sites have been 
the focus of much of the work on substrate specificity. Inter- 
actions made in the more distal sites influence catalytic efficiency 
markedly, and there is evidence for nonadditivity of mutational 
effects suggesting a functional communication between sites 
(Gron & Breddam, 1992). 

Interactions in the SI site 

The most intensively studied member of the subtilisin family is 
SBPN, which has been the subject of extensive protein engineer- 
ing investigations (reviewed in Wells et al., 1987b; Wells & Estell, 
1988). The enzyme efficiently cleaves pepiidyl amide substrates 
possessing a broad range of PI amino acids, with the AV(„/A'/„ 
value showing a linear dependence on the hydrophobicity of the 
substrate side chain. The preference of the enzyme at this position 
is roughly Tyr, Phe > Leu, Met, Lys > His, Ala, Gin, Ser » 
GIu,GIy (Esiellet al., 1986; Wells et al., i 987c). To investigate 
the role of hydrophobicity more closely, 12 different amino acids 
were substituted for Gly 166, which lies at the base of the pocket 
(Fig. 3). Analysis of the mutants showed that an increase in the 



Klg, 3. Struciure of the SI and S4 sites of subiilisin BPN' showing bind- 
ing of a peptide derived from the cocrystal siriicture witli Streptomy- 
cev subtilisin inhibitor. An a-carbon trace of the protein is shown in thin 
blue lines. Catalytic residues arc in yellow, and the inhibitor chain is in 
green with the PI and P4 side chains labeled in blue. Locations of amino 
acids at which the SI and S4 sites have been mutated are indicated in 
red. In rhe subtilisin family, both the SI and .S4 sites are generally spe- 
cific for hydrophobic side chains, but Glu LS(> in the SI site of subti- 
lisin LtPN' provides activity toward P I -Lys side chains as well. At both 
sites, speeitlcity alteration is readily achievable by the .substitution of 
a small niuiiber of residues directly in contact with substrate. Modula- 
tion of the hydrophobic specificity profiles has been achieved at both 
sites, and altered specificity toward charged residues has Ix-en achieved 
in the SI pocket. 



side-chain volume at this position, which consequently decreases 
the size of the SI cleft, caused substantial reductions (up to 
5,000-fold) in A^v/ZC^, toward large PI amino acids. This pre- 
sumably occurs due to sieric repulsion, which predominates over 
the favorable effect of a more hydrophobic pocket. Catalytic 
efficiencies toward small PI side chains were increased by up 
to 10-fold in these variants. An optima! combined volume for 
the SI and PI side chains of 160 A' was estimated from these 
data. It appears that hydrophobicity of the SI site is the main 
driving force for specificity, whereas other effects, such as at- 
tractive van der Waals forces and hydration of polar side chains, 
have a lesser though still significant role. 

Because these studies showed that specificity is easily modu- 
lated by replacing amino acids directly contacting substrate, it 
seemed plausible that more distant portions of the enzyme struc- 



342 



J, J. Perona and C.S. Craik 



ture might be of little importance. This idea was further explored 
by a mutational study in which several amino acids from the re- 
lated SCARL enzyme were exchanged for those in SBPN (Wells 
et al., 1987a). Although these two enzymes differ by Sl^^o in se- 
quence, only three substitutions lie within 7 A of the SI pocket. 
Two of these, at positions 156 and 217 (Fig. 3), directly contact 
substrate (residue 217 is in the SI' site). A third residue at posi- 
tion 169 is positioned behind the loop comprising residues 156- 
166» which forms one side of the SI pocket. In SBPN the amino 
acids are Ser 156, Ala 169, and Leu 217; these replaced the anal- 
ogous Glu 156, Gly 169. and Tyr 217 of SCARL. The wild-type 
enzymes differ by factors of 6-60- fold in their kcat^^m values 
toward peptidyl amide substrates possessing PI -Glu, Met, Phe, 
Gin, or Ala; in each case, SBPN is more efficient (Wells et al., 
1987a). 

The triple mutant E156S/G169A/Y217L was found to exhibit 
a substrate specificity profile very similar to that of SCARL. 
Cleavage at each of the PI amino acids tested occurred with ef- 
ficiencies within threefold of the target protease (Wells et al., 
1987a). These data demonstrate that, of the 86 amino acid dif- 
ferences between the two enzymes, three alone are largely suf- 
ficient to determine the differences in specificity. Further, analysis 
of singly and doubly substituted variants showed that the E156S 
mutation is alone almost entirely responsible for the shift in spec- 
ificity profile. Because the activity of the E156S/Y217L enzyme 
was found to be within twofold of the triple mutant, it appears 
PI substrate specificity is in fact locally determined to a signif- 
icant degree. 

The behavior of the El 568 variant is similar to that of other 
mutant SBPN enzymes also possessing electrostatic substitutions 
in the SI site (Table 1; Wells et al., 1987c). Sixteen variants were 
constructed at positions 156 and 166, each of which altered the 
electrostatic potential of the SI site by introducing or remov- 
ing Arg, Lys, Glu, or Asp residues at one or both sites. Analy- 
sis of the mutants showed that increases as high as 10-^-fold in 
kcat^^m toward complementary charged substrates could be 
achieved. To assess the contribution of electrostatic free energy 
to the stabilization of the transition-state complex, parallel sub- 
stitutions of roughly isosteric but uncharged residues (Met re- 
placing Lys; Gin replacing Glu) were also made. For example, 
it was found that increasing the positive charge in the SI site in- 
creases kf-ai/Kffj much more for PI -Glu than for PI -Gin sub- 



Table 1. Engineering electrostatic interactions in subtilisin^ 





Net charge 


PI -Glu 


PI -Lys 


EI56D166 


-2 




16,200 


EI56N166 


-I 


40 


17,800 


EI56Q166 


-I 


16 


12,600 


S156DI66 


-1 


17 


17.400 


EI56G166(wt) 


-I 


35 


39,800 


QIS6G166 


0 


620 


1,070 


Q156N166 


0 


110 


5.600 


E156RI66 


0 


810 


1,550 


Q156K166 


+ 1 


66,000 


1.700 


S156K166 


+ 1 


16,200 


5,400 



Substrate: swc-Ala-Ala-Pro-Glu/Lys-pNA. k^.o,/Km, s~' M~'. 



strates. In this way, substrate binding effects associated solely 
with the charge-charge interaction could be isolated. 

Several of the SI -site specificity variants were also utilized in 
a different study that addressed the ability of SBPN to function 
as a peptide ligase (Abrahmsen et al., 1991). This reaction oc- 
curs when peptides bearing a free amino-terminal group can 
compete effectively with water for attack on the acyl-enzyme in- 
termediate. The intrinsic low level of ligase activity normally 
present in SBPN was enhanced by substitution of the active-site 
Ser 221 by Cys, which shifts the relative preference toward am- 
inolysis by more than lO-^-fold (Nakatsuka et al., 1987). The 
additional mutation P225A improves ligase activity by an ad- 
ditional 10-fold (Abrahmsen et al., 1991). The usefulness of this 
SBPN variant (referred to as subtiligase) for the synthesis of pro- 
teins was improved by introducing specificity variants G166I, 
G166E. and E156Q/G166K into the S221C/P225A framework. 
Preferred ligation of Pl-GIu, Pl-Phe, Pl-Lys. and Pl-Arg es- 
ters was achieved; the specificity for ligation mirrored that for 
cleavage of peptidyl amide substrates (Estell et al., 1986; Wells 
et al.. 1987c). The ability to modulate the SI -site specificity thus 
provides greater Hexibiiity in the choice of ligation junctions. 
Subtiligase has been used to synthesize ribonuclease A and 
active-site variants of this enzyme by stepwise ligation of six es- 
terified peptide fragments 12-30 residues long (Jackson et al., 
1994). 

Substrate-assisted catalysis 

Substrate-assisted catalysis represents a strategy for enhancing 
the specificity of proteolytic cleavage. Subtilisins lacking the cat- 
alytic His 64 can be reconstituted by including a histidine resi- 
due within the substrate (Carter & Wells, 1987; Carter et al.. 
1989, 1991). By placing a His at the P2 position of peptidyl am- 
ide substrates, specificity of up to 400-fold was achieved rela- 
tive to analogous P2-Gln and P2-AIa substrates. The increased 
specificity at position P2 occurs within the context of a compro- 
mised enzyme; H64A subtilisin is reduced 10^-fold in k^^at^^my 
and H64A in the presence of a P2-His substrate remains 5,000- 
fold less efficient than the wild-type enzyme (Carter & Wells, 
1987). Mutation of Ser 221 , Asp 32, and Asn 155 in the context 
of H64A suggested that interactions of the catalytic His with the 
Ser and Asp residues are severely compromised when the His is 
present in the substrate (Carter et al,, 1991). By contrast, the 
oxyanion hole interactions appear much less disrupted. Model- 
building of P2-His substrates indicates that the imidazole ring 
can occupy roughly the same position as that of His 64 in the 
native enzyme, although some deviation in hydrogen bond dis- 
tances and angles exists, which may partially explain the reduced 
activity. 

The large database of SI -site specificity variants was again 
used to enhance the selectivity of proteolytic cleavage by the pro- 
totype H64A enzyme (Carter et al., 1989). For example, an im- 
provement of 20-foId in cleavage of 5wc-FAHY-pNA was 
observed by introducing the SI and ST-site mutations E156S, 
GI69A,and Y217L (Estell etal., 1986; Wells etal., 1987c), which 
increase catalytic efficiency toward Pl-Phe and PI -Tyr sub- 
strates. The additional mutation G166A enhanced specificity for 
Pl-Phe but not PI -Tyr substrates, as expected because the C** 
of Ala 166 appears to cause steric hindrance to the binding of 
the larger Tyr side chain. Little SF>ecificity was observed on the 



Sttbstraie specificity in serine proieases 



343 



C-ierniinal side of ihe peptide bond in the cleavage of peptide 
substrates. The mutant subiihsins have been shown to selectivelv 
cleave designed target sites in fusion proteins, even under ad- 
verse conditions, making them a usef ul additional tool in the 
repertoire of protein chemists (Carter et al., 1989). 

Iniriher insight into subsiraic-assisied catalysis was provided 
by a novel approach using phage display technology (Matthews 
& Wells, 1993: Fig. 4A). A randomized target substrate sequence 
for an improved H64A subtilisin (Carter ct al., 1989) was in- 
serted between an amino-ierminal affinity domain representing 
a variant of human growth hormone, and the carboxy-terminal 
domain of the M 13 phage gene 111 coal protein. A collection of 
phage particles bearing different substrate sequences is bound 
to immobilized hGH-binding protein and cleaved by subtilisin, 
so that phage bearing good substrate sequences arc elutcd and 
those bearing poor sequences remain bound. Propagation of the 
phage further enriches for efficient or inefficient cleavage sites. 
Analysis of the sequences that were efficiently cleaved revealed 
that Pl'-His as well as P2-His-coniaining substrates could func- 
tion in subsiraic-assisicd catalysis, ,'\nalysis of cleavage of fu- 
sion protein^ linked to alkaline phosphatase, which provides an 
easily assayed activity, suggested that P I'-His-niediaied cleav- 
age was comparable in efficiency to P2-His cleavage. Further 
study of P r-H is cleavage would be informative because release 
of the leaving group after formation of the acyl-enzyme implies 
that no catalytic His is present to assist in deacylaiion. Molec- 
ular modeling has shown that a Pl'-His can also occupy the po- 
sition vacated by His 64 in an H64A variant (Matthews *& Wells, 
1993). 



A Protease substrate phage selection 



M/3 Gone lU coat protein 



Plate containing hGH 
binding protein 



Ampiciilin resistance 




Randomized subliHsin ^^fTfiP^^^^^ . 
substrate sequence hormone vanant 



B Selection for active trypsin mutants 




CYTOPLASM 



Arg-X Arg-X 
Arg-X 

Arg.-X 
Trypsin — ^ 




The P4-S4 interactions 



C Phage display of trypsin 



Considerable specificity toward substrate residues distant fiom 
the scissile bond e.visis in the subtilisin-class family. A thorough 
mapping of the preferences of two enzymes — SBPN and 
BLAP — shows that the most marked distal interaction occurs 
on the N-ierminal side of the substrate at the S4 enzyme site 
(Oron et al., 1992). Mutational analysis at this position has been 
applied to three of the enzymes of known structure: SBPN (Bder 
et al., 1993; Rheinnecker et al., 1993, 1994), BLAP (Bcch el al., 
1992, 1993; Sorensen ei al., 1993), and BAP (Teplyakov et al., 
1992). The S>4 site is formed from the juxtaposition of two struc- 
tural elements: residues 100-107 at the amino-terminus of an «- 
heli.x in the small subdomain and residues 125-132 in an adjacent 
surface loop. Substrate interactions include both the main-chain 
/J-sheei hydrogen l>onds as well as contacts with the side chains 
of residues 104, 107, 126, and 135, which line the sides and base 
of the site (Pig. 3). Of the amino acids shaping the cleft, only 
Gly 127 is invariant in the family (Siezen et al., 1991). 

In SBPN. the amino acid side chains in the S4 .site are Tyr 104, 
lie 107. and Leu 126, which create a large liydrophobic pocket . 
Accordingly, the substrate prelercnces follow the series Phe > 
Leu. lie, Val > Ala for cleavage of pepiidyl amide substrates 
(Kheinnecker et aL, 1993). Slightly different preferences follow- 
ing the same general trend were observed toward long peptides 
occupying subsites S5-S5' (Gron et al., 1992). However, the 
range ofk\^„/k\„ values varies only over a three- to si.\fold range. 
It was suggested that the small variability might be due to com- 
pensatory shrinkage of the S4 site upon binding of smaller side 
chains (Takeuchi et al., 1991a). Efficiencies toward polar rcsi- 



fusion protein 




Gene /// coat protein 
(5 copies/phage) 



Trypsin/coat protein gene fusion 



H^. 4. Randomixaiion methodologies employed in isolation of serine 
protease substrate specificity muiants. A: "Substrate phage" approach 
applied to .subtilisin. In this method, the sequence of the substrate rather 
than the enzyme is varied to explore the substrate specificity at many 
of the subsites. Uy using H64A subtilisin as the cleaving protease, it was 
discovered thnt substrate-assisted catalysis functions when the substrate 
His is present ai the PI' as well as the P2 position. Note that in phage 
display systems, the phage particle provides a "package" in which the 
nuituni DNA and variant protein are physically linked. This facilitates 
analysis after enrichment of \ hose phage hearing good substrate sequences. 
B: Genetic selection for the isolation of trypsin variants. Periplasmic CN- 
pression of a variant trypsin capable of cleaving the nonnutriiive Arg- 
X substrate (1.2) leads to release of free Arg (3), which enters the 
cytoplasm and relieves au.soirophy. Twenty variant trypsins possessing 
altered Arg/Lys specificity ratios have been isolated in this maimer. C: 
Phage display approach for the isolation of trypsin variants. A wild-type 
trypsin gene fused lo the MI3 gene III coat protein specifically binds 
immobili/cd ecotin, a dimcric protein inhibitor of mammalian serine pro- 
leases thai is found in the bacterial periplasm. 



344 



J.J. Perona and C.S. Craik 



dues are decreased by more than 100-fold relative to hydropho- 
bic amino acids (Gr0n et al., 1992). 

Tyr 104. He 107, and Leu 126 were mutated singly and in com- 
bination to amino acids that in every case were smaller than the 
wild-type residue. The following variant enzymes were charac- 
terized kinetically toward amide substrates of the form sue- 
XAPF-pNA: YI04F, Y104A; I107G, 1107A. I107V; L126G. 
L126A. L126V, and the double mutants 1107G/Y104A, 
I107C/L126A. 1107G/L126V(Rheinneckeret al., 1993. 1994). 
These alterations test the effects of enlarging the P4 pocket as 
well as the consequences of deleting a hydrogen bond present 
between the side chains of Tyr 104 and Ser 130. 

It was found that the Tyr 104-Ser 130 hydrogen bond has lit- 
tle effect on enzyme efficiency or specificity: Y104F SBPN hy- 
drolyzes P4-AIa. Val, He, Leu, and Phe substrates nearly 
identically to the wild-type enzyme. The effect of introducing 
Ala at this position is similar to that caused by decreasing the 
size of lie 107: in each case specificity is increased for residues 
possessing large side chains at P4. Among the single mutants at 
positions 104 and 107» the largest improvements in the relative 
specificity for P4-Phe relative to P4-AIa are roughly 200-fold 
for both Y104A and 1107G. For these variants, the effects are 
achieved by maintaining approximately wild-type levels of 
^cut^^m toward Phe and sharply decreasing efficiencies toward 
Ala and the other smaller substrate residues. Mutation of Leu 1 26 
had smaller effects on relative specificities, but large decreases 
in the range of 10-IO'*-fold were observed in kcat^^m* with de- 
creased efficiency correlated with decreasing size of the side 
chain. 

The three double mutants also showed strong preference for 
large side chains at position P4 (Rheinnecker et al., 1994), 
Among these enzymes, the mutant I107G/L126V improves the 
P4-specificity for large side chains to 340-fold relative to P4-Ala, 
but in this case the maximal discrimination was achieved with 
P4-Lcu rather than P4-Phe. The other two double mutants sim- 
ilarly exhibited a maximal preference for P4-Leu. In all cases, 
nonadditivity was observed relative to the single mutants, as ex- 
p)ected from the close proximity of the three side chains. Kinetic 
parameters were also measured toward the single-residue sub- 
strate acetyl-tyrosine ethyl ester, which might be considered as 
a probe measuring the extent to which S4-site mutants affect the 
functioning of the SI site. Large decreases of up to 60-fold were 
observed, with the largest effects occurring for the double mu- 
tants. However, the same variants exhibit comparable efficiencies 
to wild-type when measured toward favored 5wc-XAPF-pNA 
substrates. This suggests that less productive binding may oc- 
cur in the absence of the subsite interactions, particularly be- 
cause the ester substrate is more easily cleaved owing to the 
better leaving group. 

The substrate preference of BLAP at the P4 substrate posi- 
tion is also toward large hydrophobic side chains (Gron et al., 
1992). A broader range of specificities exists than in SBPN: in 
this case, a 24-fold (rather than sixfold) increase in Ar„//A'^/ 
when progressing from small to large hydrophobic amino acids 
is observed. The individual subsite interactions do not affect the 
overall catalytic efficiencies in an additive manner, suggesting 
that functional communication occurs and is mediated by struc- 
tural elements of the protein (Gron & Breddam, 1992). For ex- 
ample, modest substrate preferences at some sites are masked 
if the optimal Pl-Phe and/or P4-Phe residues are present. These 
amino acids dominate the cleavage efficiency such that an up- 



per limit in k^,/K„ is reached even when other subsites are 
filled by nonpreferred residues. These other sites are therefore 
less important when a good substrate rather than a poor sub- 
strate is bound. This study underlines an important principle: 
optimal subsite mapping of subtilisins (and other proteases) 
should be carried out using sets of matched substrates where the 
intcrdependency of binding sites is not manifested. In the case 
of BLAP. the presence of an anthraniloyl group at P5 and a Pro 
at P2 apparently disrupts the Pl-Phe and P4-Phe interactions, 
such that a substrate series containing these nonoptimal groups 
r>ermits distribution of PI' site preferences over a 15-foId range. 
Only a 50*7o difference between the most and least favored PI' 
amino acid is observed in the absence of the nonoptimal groups, 
which prevents accurate mapping of the true subsite preference 
(Gron & Breddam, 1992). 

The structure of the BLAP S4 pocket is similar to that of 
SBPN. The side chains of Val 104, He 107. Leu 126. and Leu 135 
form the base and one side of the pocket, whereas Ser 128, 
Ser 1 30, and Ser 132 are situated along the outside rim with each 
of the side-chain hydroxyl groups pointing inward. The substi- 
tution of Val 104 for the Tyr present in SBPN allows Leu 135 
access to the substrate in BLAP. The only other difference in 
the pocket between the two enzymes is the presence of Gly 128 
rather than Ser 128 in SBPN. A total of 21 mutants in the BLAP 
S4 site have been constructed and analyzed (Bech et al., 1992, 
1993; Sorensen et al., 1993). At position 104 it was found that 
bulky hydrophobic side chains produced enzymes that prefer- 
entially cleaved small hydrophobic side chains, and conversely, 
smaller amino acids increased specificity toward large substrates. 
This behavior is reminiscent of the effects caused by increasing 
the size of residue Gly 166 in the SI site of SBPN (Estell et al., 
1986; see above). Mutations at other positions in the BLAP S4 
site often also showed these effects, but in many cases complex 
specificity profiles not immediately interpretable in simple terms 
were obtained. What does appear clear is that both steric and 
hydrophobic effects play important roles in determining the S4 
specificity profile (Bech et al., 1993; Serensen et al., 1993). For 
some mutants it was further suggested that structural flexibil- 
ity is also critical. 

Distinguishing the degree to which hydrophobicity, steric ex- 
clusion, and substrate-induced conformational changes function 
to determine specificity profiles requires high-resolution struc- 
tural information on the mutant enzymes. Such information has 
begun to be obtained in the study of BAP variants (Teplyakov 
et al., 1992). Substitution of Val 104 in this enzyme with Trp in- 
creased activity toward j«c-AAPF-pNA by 12-foId, The crys- 
tal structure of the uncomplexed variant showed that no other 
structural change occurs and that the S4 site is now blocked off 
such that a modeled P4-AIa residue makes a good van der Waals 
contact with Trp 104. Trp 104 in this variant is oriented nearly 
identically to Trp 104 in THERM, which also exhibits high ac- 
tivity toward 5wc-AAPF-/>NA. 

Comparison of the structures of SSI and a P4-Met to Gly mu- 
tant of SSI complexed to SBPN showed that the S4 site under- 
goes a substantial shrinkage upon binding of P4-Gly (Takeuchi 
et al., 1991b). The structural flexibility in this enzyme raises the 
possibility that a capacity for such rearrangement may exist in 
other members of the family as well. Required for an assessment 
of the degree of flexibility, and the extent to which amino acid 
alterations affect this property, are crystal structures of wild- 
type and mutant enzymes complexed to substrate analogs pos- 



Substrate specificity in serine proteases 



345 



sessing small and large side chains ai the P4 position. In the case 
of BAP, for example, it would be of interest to determine the 
catalytic efficiencies of the wild-type and V I04W enzymes toward 
larger hydrophobic P4-side chains and then to carry out a sys- 
tematic structural analysis of complexes of each enxyme with 
analogous inhibitors. Such an analysis for the chymotrypsin-like 
u-lytic protease has yielded substantial insight into the structural 
basis for enzyme flexibility (Bone ci al., 1991; see below). 

Together these mutational alterations within the subtilisin Si 
and S4 sites allow two important conclusions: (I) only the lo- 
cal environment of amino acids directly contacting substrate 
need be considered in designing specificity changes; (2) there is 
no important distinction between hydrophobic and ix>Iar enzyme- 
substrate interactions because each type is manipulatable to gen- 
erate new specificity profiles while maintaining high activity. The 
importance of these generalizations to protein design in other 
systems depends upon the extent to which the structural design 
of the binding cleft, and the nature of the reaction being cata- 
lyzed, are crucial parameters. As we shall see, structural con- 
text can have great influence in mediating the extent to which 
specificity alteration is straightforward. A clue to its important 
role can be seen in the dependence of catalytic efficiency on the 
extent to which subsites are filled. The signal thai distal portions 
of substrate are bound is transmitted over large distances and 
must in some way be mediated by the intervening protein struc- 
ture. Long-range effects are key in the chymotrypsin family of 
enzymes, both in terms of filling subsites as well as in determin- 
ing specificity at a single site (Corey ei al., 1992; Hedstrom et al., 
1992, 1994a, 1994b; Perona et al,, 1995; sec below). 

Prohormone convcrtases: Specificity 
toward paired dibasic residues 

Tissue-specific processing of precursor proteins in mammalian 
cells is accomplished by a subfamily of subiilisin-class enzymes 
known as prohormone convcrtases. The need for this cleavage 
event to release bioactivc products provides a crucial regulatory 
step for I he cell. Early protein sequencing studies of various pep- 
tide hormones suggested that tlie dibasic sequences Lys-Lys and 
Lys-Arg provided the sites of cleavage (reviewed by Lazureetal., 
1983), The first protease isolated in this class was the yeast kexin, 
which cleaves with high selectivity both synthetic peptide and 
protein substrates possessing Lys-Arg at the P2 and PI sites, re- 
spectively (Fuller et al., 1989; Brenner & Fuller, 1992). Follow- 
ing isolation of the yeast enzyme a number of mammalian 
species have been cloned including furin (Van den Ouweland 
et al., 1990), PC1/PC3 and PC2 (Smeekens et al,, 1991), and 
more recently the enzymes PC4. PC5, and PACE4 (Rehemtulla 
et al., 1993), The enzymes possess pro-domains and must there- 
fore themselves be processed prior to activation. Maturation has 
been shown to occur in an aut oca la lytic fashion in the cases of 
PC2 (Matthews et ah, 1994) and of furin (Crecmers et al., 1993). 
These studies have now shown thai most cleavage lakes place 
either at Lys-Arg and Arg-Arg dibasic sites, or ai an Arg-X- 
Lys-Arg consensus site, depending on the intracellular pathway 
of localization. 

Mature prohormone convcrtases arc large enzymes that typ- 
ically possess 600-800 amino acids. In addition to the subtiltsin- 
iike catalytic domain, they also variously possess other structural 
elements such as transmembrane anchors, Ser/Thr-rich regions, 
glycosylation sites and Cys-rich regions (Seidah et al., 1991). 



Based on homology modeling, it was predicted that these en- 
zymes possess a greatly increased number of negatively charged 
residues near the substrate binding cleft. Many of these amino 
acids are highly conserved (Siezen et al,, 1991; Fig. 5). Their im- 
portance was tested by site-directed mutagenesis of furin, using 
processing of a peptide hormone in vivo as the functional assay 
(Crcemers et al., 1993). The following residues were mutated: 
Asp 33, Asp61, Glu 101, Asp 104, Glu 107, Glu 129, Asp 130, 
Asp 131 , Asp 165, and Asp 209. Cleavage was assayed toward 
the wild -type hormone precursor as well as toward three mutants 
in which one of the positively charged amino acids in the cleav- 
age site sequence P4-Arg-P3-Ser-P2-Lys-P l-Arg was altered to 
Gly or Ala. The ability of mutants to carry out autoproteolytic 
activation was also assessed. 

Mutation of the P 1 -Arg in this sequence gave rise to prohor- 
mones that could not be processed either by wild-type or by any 
of the mutant furins, suggesting that a basic residue at this po- 
sition is critical io recognition (Creemers et al., 1993). Several 
of the mutants possessed preferences for one of the three mu- 
tant prohormone substrates, implicating the Asp or Glu at that 
enzyme position in recognition of the substrate residue that was 
altered. Thus, Asp 33 is implicated in P2-site binding and Glu 107 
in P4-site binding, in accord with modeling that predicts their 
locations adjacent to these substrate positions (Siezen et aL, 
1991). Mutation of Asp 165, predicted to lie at the base of the 
SI site, abolished activity, as did removal of the negative charge 




I-'iR. S, A di.siinct subcla.ss ol" itie .stibiilisin family of serine proieascs. 
the prohorinoiie convcrtases, are involved in prohormone processing in 
n number of important physiological conic.xis. The specificity of pro- 
cessing is toward sites possessing 2-4 Arg and l^ys residues ai the PI- 
P4 positions. Shown is a solvent -accessible protein surface on which are 
mapped the binding determinants specifying prohormone processing by 
furin. The structure is that of subtilisin BPN' complexed to SSI because 
no three-dimensional structure is yet available in this subclass, A large 
number of negatively charged amino acids is found on the substrate bind- 
ing face of the enzyme (red). The catalytic triad is in blue and the sub- 
strate is in yellow, with the P1-P4 amino acids in green. 



346 



J J, Perona and C.S. Craik 




Efastasa Collagenase 




from positions GIu 129, Asp 130, or Asp 131 putaiively near the 
P4 site. Interestingly, ilie roughly isosieric mutant D209L abol- 
ished activity, despite being located some distance from the bind- 
ing cleft. By contrast, other substitutions nearer to the substrate 
could be introduced without loss of activity. These furin mu- 
tants provide the first mapping of structural determinants af- 
fecting prohormone processing. An obvious need now exists for 
an accurate three-dimensional structure of an enzyme in this 
class. Together with detailed kinetic analysis of synthetic sub- 
strates, this would provide substantial insight itito (he structural 
determinants of this most interesting specificity. 

Substrate specificity in the chyniotrypsin family 

As in the subtilisin family of enzymes, the diversity of substrate 
specificity among the chymoirypsin-like proteases rests upon 
small differences in structure in the substrate-binding cleft. All 
of the chymotrypsin-like enzymes are composed of two juxta- 
posed /5-barrel domains, with the catalytic residues bridging the 
barrels (Fig. lA; Kraut, 1977; Sieitz & Shulrnan, 1982; Bazan 
& Fletterick, 1990). Crystal structures are available for bovine 
chyniotrypsin (Matthews ei al.. 1967), porcine pancreatic elas- 
tase (Watson ei al., 1970), bovine, rat, and Sfrepfottiyces grisens 
trypsins (Ruhlniann ct al., 1973: Sprang el al., 1987; Read & 
James, 1988), rai tonin (Fujinaga & James, 1987), kallikrein 
(Bode ei al., 1983), rat mast cell protease II (Remington et al., 
1988), huiTian neutrophil elastase (Navia et al., 1989). throm- 
bin (Bode el al., 1989a), factor Xa(Padmanabhaneial., 1993), 
and complement factor D (Narajana et aL. 1994), Additionally, 
structures are available for four microbial enzymes: S. griseus 
proteases A, B, and E (SGPA, Delbaere et al., 1979; SGPB, 
Moult eral., 1985; SGPE, Nienaber et aL, 1993), and the Lyso- 
bocter enzymogenes a-lytic protease (Braver et al,, 1979). The 
microbial enzymes share the chymoirypsin-like bilobal j3-barrei 
structure but are more distantly related as evidenced by their 
shorter sequences and substantial structural differences in sur- 
face loops (James. 1976). 5. prisons trypsin, on the other hand, 
is an example of a microbial enzyme that is more homologous 
to mammalian .serine proteases than to its bacteria! counterparts 
(Read & James, 1988). 

Molecular modeling methods have been used to create a 
structure-based sequence alignment of the chymotrypsin-like ser- 
ine proteases (Greer, 1990), which is very useful in assessing sub- 
strate preferences. The specificity is usually most pronounced 
at the S I -sites of the enzymes, where the majority of sequences 
group into one of three subclasses definable by inspection of a 
small number of crucial amino acids. Position 189, located at 
the base of the SI "pocket, is very highly conserved as aii Asp 
in enzymes with trypsin-likc specificity toward Arg- and Lys- 
containing substrates (Fig. 6; chymotrypsin numbering system 
is used throughout — sec Greer, 1990). It is found as a Ser or 
other srnali amino acid in chymotrypsin and elastase-class en- 
zymes, whicli manifest specificity toward aromatic and small hy- 
drophobic amino acids, respectively. The amino acid side chains 
at positions 190 and 228 extend into the base of the pocket as 
well and play an additional role to modulate the specificity pro- 
file. Amino acids at positions 216 and 226 are usually Cly in both 
trypsin and chymotrypsin-like enzymes; larger amino acids at 
these positions partially or fully block access of large substrate 
side chains to the base of the pocket (Fig. 6). Accordingly, elas- 
tases possess larger, usually nonpolar residues at these positions, 



195 



214 



226 



Bovine Chymourypsin CAO- ASGV-SSCMGDS 

nan Trypsin CVGFLEGGKD£CQGDS 

Porcine Elastiase CAGGDG-VRSGCQGDS 

Crab Collagenase CID-STGGKGXCDGDS 



SWGS-ST-CS-TSTPGVl 
.SWG Y G — C A LP D HP G^^/^ 
SFVSRLGCHVTRKPTVE 
S FGAAAGC EA-GYPDA£ 



L>OOp 1 



Loop 2 



Fig. 6. Common architecture of the SI site of four members of ihc 
chymotrypsin-like class of serine proteases, with the eponymous Ser 195 
catalytic residue shown in blue. An early paradigm for substrate specific- 
ity was derived from a comparison of the SI -site structures of trypsin 
(A), chymotrypsin (H). and pancreatic clastasf (C). Amino acids at po- 
.siiions 216 and 226 (lel'i side of the pocket) and at 189 and 190 (right 
>idc) are indicated by van der Waals surfaces colored white for uncharged 
and red for ncgaitvcly charged residues. The shape and electrostatic 
character of each site corroborate the specificities toward Arg/Lys, 
Phe/Tyr/Trp, and Ala, respectively. Fiddler crab collagenase (D) pos- 
sesses a negatively charged Asp in an altered position relative lo tryp- 
sin. Although it might be predicted that this enzyme possesses a 
irypsin-hke specificity profile, it is instead capable of efficiently cleav- 
ing I -side chains of substrates specific to each of the three other pro- 
teases. Amino acid sequence alignment of these four enzymes (E) 
showing the distinction in primary specificity residues (bold) and sec- 
ondary determinants (underlined). Positions in the sequence of two ad- 
jacent surface loops are also shown (see Figs. 7, 1 1, 13). 



providing a platform for interaction with small hydrophobic 
substrate P! -amino acids. The shapes of the Si pockets of tryp- 
sin, chymotrypsin, and elastase thus appear to readily explain 
the observed specificities, leading to the canonical view that sub- 
strate preferences are in fact determined by this limited set of 
amino acids (Stroud, 1974). However, as discussed below, this 
perspective has now been shown to be incorrect by the discov- 
ery that other structural elements distant from the substrate 
binding site are also crucial determinants of specificity. 

Kinetic measurements of substrate preferences for the two 
mammalian elastases of known structure (PPE and HNE) per- 
mit a more detailed appraisal of structure-function relationships 



Substrate specificity in serine proteases 



347 



(Bode el al., 1989b). Both enzymes possess bowl-shaped hydro- 
phobic SI binding sites that accommodate small hydrophobic 
substrates (Watson et al., 1970; Navia et al., 1989). However, 
the SI site of PPE has been described as slightly less hydropho- 
bic and marginally smaller than that of HNE (Bode et al.» 
1989b). PPE cleaves peptide bonds preferentially at small Pl- 
Ala and Nva side chains (Harper et al.. 1984)» whereas HNE 
manifests substantial activity toward the branched-chain Val, He, 
and Leu residues (Harper et al., 1984; Stein et al.. 1987). These 
preferences are in accord with the smaller SI site of PPE, but 
the small difference in size is insufficient to account for the al- 
tered profiles. The identity of the amino acids that line the SI 
pockets differ substantially in the two enzymes, most notably 
by the presence of the charged Asp 226 in HNE, which is present 
as a Thr in PPE. In HNE, Asp 226 is buried by Val 216 and 
Val 190, and the carboxylate group points away from substrate 
into a network of buried water molecules (Navia et al., 1989). 
One possible explanation for the superior ability of HNE to 
cleave branched-chain substrates could thus be that the SI -site 
possesses greater intrinsic flexibility as a consequence of its dif- 
ferent construction and interaction with surrounding portions 
of the structure (Bode et aL, 1989b). A small shrinkage of the 
SI site is in fact observed upon binding Val relative to Leu in 
this position (Bode et aL, 1986b; Wei et al., 1988). 

Cleavage of peptide substrates adjacent to the acidic Asp and 
Glu residues is the hallmark of an additional subclass of en- 
zymes. Recognition of the negatively charged carboxylate is 
accomplished by means of a His residue at position 213 in a 
number of microbial enzymes including the Staphyiococcus au- 
reus V8 protease (Drapeau. 1978), SGPE (Svendsen et al., 1991), 
and two epidermolytic toxins of S. aureus (Dancer et al., 1990). 
Recently, the crystal structure of SGPE complexed with the tct- 
rapeptide Ala-Ala-Pro-GIu has been determined at 2.0 A reso- 
lution (Nienaber et al., 1993). The structure reveals that the Glu 
carboxylate is indeed bound directly by His 213 as well as by the 
side chains of Ser 192 and Ser 216. The structure of the enzyme 
also shows that His 213 is hydrogen bonded in series to two other 
His residues at positions 199 and 228 to form a solvent-inaccess- 
ible His triad that penetrates through the core of the enzyme. 
This remarkable structural feature is postulated to play a role 
in substrate charge compensation, by delocalizing the substrate 
negative charge through proton transfer across the His residues 
(Nienaber et al., 1993). No other serine protease is known to pos- 
sess the His triad. An alternative to the use of His 213 is found 
in a protease from cytotoxic T-lymphocytes, which possesses an 
Arg at position 226 (Murphy et al., 1988). This enzyme is un- 
usual in its preference for cleavage at Asp rather than Glu resi- 
dues (Odake et al., 1991). Mutation of Arg 226 to Gly. followed 
by qualitative assay of crude lysates in which the variant was ex- 
pressed, showed lowered activity toward peptidyl PI -Asp thio- 
benzyl ester substrates and increased activity toward analogous 
Pl-Phe substrates (Caputo et al., 1994). 

Virtually all chymotrypsin-like serine proteases share a com- 
mon feature: an SI -site specificity that is restricted to a relatively 
narrow subset of the naturally occurring amino acids. It there- 
fore came as some surprise when one enzyme, the coUagenolytic 
serine protease 1 from the fiddler crab Uca pugi/ator, was shown 
to possess high catalytic activity toward each of trypsin, chymo- 
trypsin, and etastase-like substrates (Grant & Eisen, 1980). The 
specificity profile of this enzyme has recently been reexamined 
in detail (Tsu et al., 1994). Crab collagenase exhibits 5% of clas- 



tase. 10*Vb of chymotrypsin. and 65% of trypsin activity, as as- 
sessed by kcaf/Kfti values toward peptidyl amide substrates 
possessing Ala, Phe, and Arg, respectively, at the PI position. 
^cai values toward each of these amino acids are extremely 
high. Additionally, it is the most efficient chymotrypsin-like 
enzyme known toward PI -Leu and PI -Gin amide substrates, 
manifesting 6-foId and 50-fold greater activities than does chy- 
motrypsin toward these substrates (Tsu et al., 1994). Therefore, 
the chymotrypsin-like scaffold can maintain an SI binding 
pocket that accommodates a very broad range of amino acids 
without sacrificing catalytic efficiency. 

Crab collagenase exhibits an interesting rearrangement of a 
negative charge at the base of the SI site: residues Asp 189 and 
Gly 226 of trypsin are altered to Gly 189 and Asp 226 in colla- 
genase (Grant et al., 1980; Fig. 6). However, this predicts a stria 
specificity for Pl-Lys and Arg substrates: the amino acids at po- 
sitions 190 and 216 are Thr and Gly, respectively, which allows 
access of the substrate to Asp 226. As discussed above. Asp 226 
of human neutrophil elastase is buried by Val 216, leading to a 
hydrophobic specificity profile (Navia et aL, 1989). A possible 
explanation for the ability of crab collagenase to accommodate 
hydrophobic as well as positively charged substrate residues is 
provided by a recently refined 2.5-A crystal structure of the en- 
zyme complexed with the dimeric serine protease inhibitor ecotin 
(J.J. Perona, C.A. Tsu, C.S. Craik, & R.J. Fletterick, submit- 
ted for publication). The structure shows that one carboxylate 
oxygen of Asp 226 is accessible to substrate, but that the Pl- 
methionine residue of ecotin does not enter the SI -site and binds 
instead on the surface of the enzyme adjacent to the disulfide 
bond at positions 191-220. Modeling shows that the pocket can 
provide multiple binding sites that accommodate diverse amino 
acid side chains in distinct positions. Therefore, SI -site fiexibil- 
ity does not appear to be utilized as a structural determinant in 
the broad specificity of crab collagenase. 



a-Lylic prolease: Exploring the role of structural 
plasticity in substrate specificity 

a-Lytic protease, an extracellular enzyme produced by the soil 
bacterium L, enzyntogenes, has been the subject of intensive 
analysis aimed at relating structure to catalytic activity. This mi- 
crobial protease, while possessing the chymotrypsin-like fold 
comprising two /3-barrels (Brayer et al., 1979), nevertheless dis- 
plays large insertions and deletions relative to the pancreatic en- 
zymes, resulting in an overall RMS deviation in the positions of 
structurally equivalent or-carbons of 1 .36 A for 1 10 of 198 amino 
acids, when compared with chymotrypsin (Fujinaga et al., 1985), 
By comparison, the equivalent pairwise fits with the bacterial 
proteases SGPA and SGPB yield RMS deviations of roughly 
0.7 A, a value very similar to that which relates the mamma- 
lian pancreatic enzymes to each other. The SI pockets of a-lytic 
protease and trypsin are particularly divergent in structure (Fig. 7). 
An insertion of two amino acids causes Met 192 of a-lytic pro- 
tease to occupy a position similar to Ser 190 of trypsin. More 
strikingly, an adjacent surface loop at positions 185-188 is de- 
leted in a-lytic protease, and a second nearby loop at positions 
217-225 is enlarged by eight amino acids. A consequence of 
these differences is that, although both enzymes possess a di- 
sulfide bond linking the conserved residues Cys 191 and Cys 220, 
the positions of the sulfur atoms are displaced by 7-8 A (Fig. 7). 



34S 



J, J. Perona and C,S. Craik 




1, Diversity in SI -site structure between the mammalian antl the 
microbial trypsin -like eiiv.ymcs is ilhisi rated by a Miperposiiion of tryp- 
sin (green) and a-!ytic protease (red). Although the maniinallan en/.ymes 
such as trypsin possess two weli-deHncd loops (loop 1 and loop 2) join- 
ing the /3-strands of the specificity pocket, in a-lytic protease and other 
microbial enzymes loop 1 is absent, whereas loop 2 is greatly enlarged. 
Conserved disulfide bonds of each enzyme (Cys I9l-Cys 220; yellow) 
are displaced some 7 A Irom each oihcr. The catalytic triad is shown 
at the top in green. 



Kinetic data show thai a-lytic protease possesses a hydropho- 
bic specificity profile for substrate residues in the PI position. 
The preference of the enzyme at PI, as described by relative 
AV(///A';„ values, is roughly Ala > Met, Val, Gly > NIe > Leu > 
Phc for hydrolysis of ictrapeptide amide substi aies (Bauer ei al., 
1981; Bone ei al., 1991). The structural elements that interact 
with the PI -substrate side chains comprise the three hydropho- 
bic side chains Met 192, Met 213, and Val 2 1 7a, which together 
form a shallow depression in the enzyme surface (Brayer ei al., 
1979; Fiijinaga et al., 1985; Fig. 8). More recently, six crystal 
structures of the enzyme complexed with pepttdyl boronic acid 
inhibitors of the general structure R-boroX (where R is met h- 
oxysuccinyl-Ala-Ala-Pro and boroX is the a-aminoboronic acid 
analog of Ala, Val, lle» NIe, Leu, or Phe) have been determined 
at resolutions between 2.0 and 2.5 A (Bone et al., 1987, 1989a, 
1991). Boronic acids are tight-binding (/C;'s in the nanomolar 
range) reversible inhibitors of serine proteases (Kettner & Shenvi, 
1984) that form covalcnt, nearly teirahedral adducts with Scr 195 
(Bone et al., 1987). They represent good structural analogs of 
the high-energy tetrahedral intermediate present on the actual 
catalytic pathway. 

The crystal structures of the boronic acid complexes confirm 
that covalcnt tetrahedral adducts are formed with O7 of Ser 195 
for the Pl-Ala, Val, He, Leu. and NIc inhibitors. The large Pl- 
Phe side chain cannot fit into the Si -site, leading to the forma- 
tion of an unusual trigonal adduct that includes His 57 (Bone 




Kig. 8. Structure of the Si site of a-l\iic protease bound 10 the substrate 
analog .ywc-A la- Ala-Pro- Ala-boronic acid (red), showing the positions 
of the hydrophobic amino acids Met 192, Met 213, and Val 217a, which 
form a platform for binding of small hydrophobic side chains. The three 
/i-strands of the SI site are shown in yellow and the large connecting 
loop is in green. Catalytic groups are also in green (top). Mutation of 
either Met 192 or Met 213 to Ma creates variant enzymes possessing 
greatly broadened specificities toward hydrophobic amino acids, with- 
out sacrificing catalytic efficiency. 



et al., 1989a). The interactions of the inhibitor among these 
structures are nearly identical with the exception of tlie way in 
which the PI side-chains interact with Met 192, .Met 213, and 
Val 217a. These side chains adjust conformation in response to 
the differing sizes and shapes of the inhibitor amino acids. Small 
shifts in the position of adjacent main-chain atoms in the SI and 
S2 specificity sites occur in the complexes with the larger Nle and 
Phe.' Particular importance has been a~sl;ri bed to the rearrange- 
ments at positions 21 7a-2 1 7d (Bone et al,. 1989a, 1991; see be- 
low). Low activity toward the larger Leu and Phe side chains 
appears to arise solely from steric considerations, whereas Met 
is preferred to Leu presumably owing to its greater flexibility. 
Although the structural basis for the preference of Ala relative 
to Val was not unambiguously clear, it was proposed thai strong 
binding to the oxyanion hole, required in the transition state, 
is prevented for the Val substrate on steric grounds. Differences 
in the electronic character of the boronate inhibitor, relative to 
a true transition state, do not allow for a complete mimicking 
of the latter (Bone et al., 1989a). 

The substrate specificity profile of «-Iytic protease was altered 
dramatically by the introduction of either of two single-site tnu- 
tations in the SI site: MI92A or M2I3A (Bone et aL, 1989b; Ta- 



Substrate specificity in serine proteases 

ble 2; Figs. 8, 9). In each case, high activity toward Ala was 
retained, but the increased size of the SI pK>cket allowed accom- 
modation of Pl-side chains as large as Phe, with catalytic effi- 
ciencies kcai/K,„ increased up to 15- fold relative to wild-type 
cleavage at Pl-Ala. For M192A, improved catalytic efficiencies 
toward PI -Met and Pl-Va! resulted mainly from lowered /^^ 
values, whereas the Pl-Leu and Pl-Phe substrates were im- 
proved in both kcai and /T^. The catalytic activity toward Pl- 
Leu and Pl-Phe substrates was improved by 10^-10*-fold, 
respectively, relative to wild type. However, the wild-type pref- 
erence of nearly lO^-fold for Pl-Ala/Phe was decreased to 
30-foId in M192A and nearly completely eliminated in M213A 
(Table 2). Complicating a straightforward interpretation of the 
profiles of these variants were two factors: (1) the dependence 
of AVff,, /£",„, and A'co/Z^m was not correlated with the size or hy- 
drophobicity of the PI side-chain; (2) enlargement of the pocket 
by the same volume in the two mutants gave rise to considera- 
bly different functional effects. Therefore, extensive structural 
analysis of the mutant enzymes complexed with the boronic acid 
inhibitors was carried out to understand which factors cause the 
altered specificities (Bone et al., 1989b, 1991). 

The principle rationale for the exceptionally broad specific- 
ity profiles of M192A and M213A is that the SI site possesses 
structural plasticity, which encompasses a combination of alter- 
nate side-chain conformations as well as deformability of the 
main chain (Bone et al., 1989b; Fig. 9). For example, accom- 
modation of the Pl-Phe side chain by M192A results from a 
substrate-induced conformational change, in which the side 
chain of Val 217a rotates to remove one carbon from the pocket, 
and the main chain from Val 2 1 7a to Val 2 1 7d shifts by 0.5-0.8 A . 
This permits the large infiexible aromatic ring to be nearly com- 
pletely buried in the specificity pocket. In this case, some of the 
binding energy is presumably used to drive the conformational 
change in the protein, a phenomenon that is also observed to 
lesser extents in other mutant-inhibitor complexes. In general, 
hydrogen bond lengths, buried hydrophobic surface area, un- 
filled cavity volume, and the magnitude of conformational 
changes vary significantly among the various mutant and wild- 
type complexes (Bone et al., 1991). The energetic consequences 
of these differences were quantified (see Bone & Agard (1991) 
for a review of the energetics of intermoiecular interactions) and 
correlated with free energies of catalysis for the various mutant- 
substrate combinations. 

The analysis has led to an increased understanding of the way 
in which the different energetic terms can contribute to the sta- 
bilization of the enzyme-substrate complex, although no single 
factor has-been found that consistently correlates well with ei- 



Table 2. Broadening the specificity of a-lytic protease^ 



X Wild type M192A M2I3A 



Ala 21.000 10,000 600 

Val 790 3,000 340 

Mel 1,800 35.000 980 

Leu 4.1 11,000 160 

Phc 0.38 31.000 340 



Substrate: 5«c-Ala-Ala-Pro-X-/)NA. AVa//A'„. s"' M~'. 



349 




Fig. 9. Principal rationale for the ability of a-lylic protease mutants to 
exhibit greatly enhanced specificities toward new substrate side chains 
"is structuralplasiiciiy of the SI site. Shown is a superposition of five 
structures of the M192A variant of the enzyme (the new Ala 192 side 
chain is at the right side). Each enzyme is complexed with a peptidyl t>o- 
ronaie inhibitor (not shown for clarity) possessing a particular hydro- 
phobic Pl-side chain (see Fig. 8 for inhibitor binding). The conformation 
of the active site adjusts to the different substrates at position Gly 216 
and in the following loop region (bottom). Both side-chain and main- 
chain rearrangements are important components of active-site plastic- 
ity. The ability of the active site to adjust in this manner may be an 
important factor in the ability to cffea specificity modification by mu- 
tation at only a single site. 



ther activity or inhibition (Bone et al., 1991). Thus, the wild-type 
enzyme has a relatively limited ability to adapt to large side 
chains, so that the specificity profile is driven primarily by steric 
exclusion. M192A, however, is improved in its ability to hydro- 
lyze large side chains in part because the degree of conforma- 
tional change required for their accommodation is reduced; 
further, it also possesses the ability to shrink so that Pl-Ala sub- 
strates are hydrolyzed well. By contrast, the M213A pocket can- 
not contract, leading to a sharply reduced activity toward Pl-Ala 
as well as a reduced discrimination relative to PI -Gly (Boneet al., 
1991). In both mutants, however, the broad specificities depend 
on the ability of the main chain and side chain atoms at posi- 
tions 2I7a-2l7d to readjust (Fig, 9). This fiexibility is proposed 
to arise from a large adjacent surface loop, which begins at res- 



350 



J, J. Perona and C.S. Craik 



idue 217a (Figs. 7, 8), and which appears lo be able lo absorb 
structural changes in the preceding residues. The energies of in- 
teraction of the SI site with this and other peripheral structural 
elements thus also play a significant role in determining the spec- 
ificity profiles. 

Another recent study of a-lytic protease used random muta- 
genesis of four residues in the substrate binding pocket, coupled 
to an activity screen using synthetic substrates, to identify new 
variants with altered specificities (Graham et al., 1993). A library 
was constructed beginning with iheM192A variant, with random- 
ization of positions Gly 192a, Arg 192b, Met 213, and Val 217a. 
Screening and qualitative characterization of 47 active variants 
revealed that a majority of the enzymes retained a specificity 
profile similar to that of the parent M192A. Also emerging from 
the screen was a subclass of enzymes capable of cleaving Pl-His- 
containing substrates. All mutants possessing this ability con- 
tained His 213, an amino acid heretofore correlated with Pl-GIu 
specificity in other microbial enzymes (Nienaber et al,, 1993). 
In general, residue 213 appears to play a significant role as a 
primary specificity determinant in several microbial enzymes. 
Although this amino acid has not yet been mutated in any mam- 
malian protease, it appears very unlikely that it will assume a 
similar role. Clearly the divergence in structure of the SI site in 
the two subclasses (Fig. 7) has led to a more prominent role for 
this residue in the bacteria! enzymes, despite the fact that its po- 
sition relative to the Ser 195/His 57 catalytic couple does not 
vary. 

Kinetic data indicate that «-lytic protease makes substrate 
binding interactions over at least six subsitcs from P2' to P4 
(Bauer et al., 1981). Interestingly, the crystal structure shows 
that a small hydrophobic pocket exists beyond the P4 side chain 
of the tetrapeptide boronic acid inhibitor, formed from residues 
Leu 227, Leu 180, Val 167. .Ma 169, and Ser 225 (Bone et al., 
1987). Although extension of a substrate side chain to fill the 
S5 site does not have a significant inHuence on kinetic param- 
eters (Bauer et al., 1981), it is possible that additional binding 
energy from interactions in the hydrophobic pocket cannot be 
realized in catalysis unless a P6 side-chain is also bound. Little 
specificity has been observed at ihe other subsiies, although a 
preference for Pro at position P2 has been noted in binding of 
the peptide boronic acid inhibitors (Boneet al., 1987), Although 
the S2 enzyme sire is hydrophobic, adjacent side-chain hydroxyl 
groups of Ser 214 and Tyr 17 I participate in a hydrogen bond- 
ing network, which includes the carboxylate of Asp 102. Intro- 
duction of the mutations S2I4A and VI 71F caused decrea.ses in 
both kf-ai i*nd K„,, and the data were used to infer that the role 
of the two hydroxyl groups in the native enzyme is to facilitate 
catalysis by maintaining the S2 site in an optimal configuration 
(Epstein & Abeles, 1992). 

Mutational analysis of trypsin: Combining 
structural genetics, classical cnzymology, 
and X-ray crystallography 

Trypsin represents the third serine protease that has been the 
subject of extensive mutational analysis aimed ai an understand- 
ing of substrate specificity. These studies have focused largely 
on the origins of specificity at the primary SI site. At this posi- 
tion, trypsin hydrolyzes amide substrates containing Pl-Lys and 
PI -Arg amino acids by factors of IQ - or greater relative to the 
next-preferred residues (Graf et al,, 1988; Evnin et al., 1990). 



The preference of the enzyme is 2- 10- fold in favor of Arg- rel- 
ative to Lys-containing substrates (Craik et al., 1985; Perona 
et al., 1993c). As might be expected from their structural dis- 
parity, Lys and Arg interact in a differential manner with the 
primary determinants Asp 189 and Ser 190 (Ruhlmann et al., 
1973; Bode et al., 1984; Fig. 10). The guanidinium group of Pl- 
Arg substrates makes an ion-pair interaction with Asp 189, 
whereas the interaction of Pl-Lys is solely by a water- mediated 
contact. Both Arg and Lys substrate side chains also interact 
with Ser 190, 

An early study assessed the precision with which the SI site 
is constructed by introducing small perturbations: the Gly resi- 
dues at positions 216 and 226 were converted to Ala, resulting 
in the three trypsin mutants G2I6A, G226.A and G216A/G226A 
(Craik et al., 1985; Fig, 10). Relative specificities for tripeptide 
amide Pl-Arg/Lys substrates, as assessed by the ratio of 
kf,y,/Kf„ values, were altered by up to 20- fold. Catalytic effi- 
ciencies were decreased by 40-fold to 10**-fold, and these effects 
involved significant decreases in Av^,, as well as higher K,„ values. 
The differential effects of the AVu/ ^^^^ ^/n values resulted in en- 
zymes that were more Arg specific (G2I6A) and more Lys spe- 
cific (G226A) than the wild-type enzyme. Subsequent crystal 
structure determinations of trypsins G226A (VVilke el al., 1991) 
and G216A (M.E. McGrath & R.J. Fleiterick, unpubl. results) 




Kig. 10. Role of the position of ihc negative charge at the base of the 
trypsin SI site has been probed by random and site-directed mutagenesis 
coupled lo crystal structure analysis of variants. Shown is the structure 
of the SI binding pocket of trypsin, indicating the positions at which ihc 
negatively charged amino acid has been determined by X-ray cry.stal 
structures. Blue, wild -type (rypsin at position 189; red, trypsin D189G/ 
G226D at position 226; yellow, e.\ogenously added acetate ion in tryp- 
sin D189S (aceiaic reconstitutes activiiy toward PI -Arg and PI -Lys- 
containing substrates). Wild-type amino acids at positions 216 and 226 
arc each Gly, permitting access of the large Pl-Lys (green) and PI -Arg 
side chains to Asp 189. 



Substrate specificity in serine proteases 



351 



complexed with benzamidine showed that the alanine substitu- 
tions produced no structural perturbations beyond the imme- 
diate vicinity of the mutated residues. Because the catalytic triad 
Ser 195, His 57, and Asp 102 amino acids are unaffected by these 
binding pocket alterations, it is highly probable that the de- 
creases in kcat are attributable to altering the catalytic register 
of the scissile bond. These data thus provided an early dem- 
onstration that substrate binding and catalytic turnover are in- 
terrelated functions in trypsin, and that they can be affected 
differentially to alter the function of the enzyme, 

A series of studies have addressed the role of the negatively 
charged Asp 189 residue in binding and catalysis. These inves- 
tigations have made use of both site-directed mutagenesis as well 
as a genetic selection approach for the isolation of new variants 
(Fig. 4B). The selection is based on expression of a library of 
trypsin variants into the |>eriplasmic space of an E. coii strain 
that is auxotrophic for arginine or lysine {Evnin et al.. 1990). 
Cells are plated on minimal media containing a nonnutritive sub- 
strate analog of one of these amino acids; active trypsins cleave 
the analog, liberating free amino acid and thereby relieving the 
auxotrophy (Evnin et al., 1990; Perona et al,, 1993a). 

Twenty variant trypsins have been isolated from a library of 
400 possible mutants encompassing the amino acids at positions 
189 and 190 at the base of the SI site. Kinetic characterization of 
these enzymes, as well as of the variants D189K(Graf eial., 1987) 
and D189S (Graf et al.. 1988), indicates that the presence of a 
negative charge at the base of the binding pocket is essential to 
high-level catalysis by trypsin. Variants lacking the negative charge 
are compromised in k^a,/Km toward peptidyl Arg- or Lys- 
containing amide substrates by a factor of 10^ or greater. Ac- 
tivity toward these substrates is partially restored by the presence 
of an Asp or Glu residue at positions 189 or 190. The variants 
span a range of catalytic efficiencies ranging from wild type to 
decreases of lO'^-foId (Evnin et al.. 1990; Perona et al., 1993a). 

A framework for the interpretation of these data is provided 
by kinetic and crystallographic investigation of two other vari- 
ants: trypsins D189G/G226D (Perona et al.. 1993b, 1993c) and 
D189S (Perona et al., 1994). The structure of each mutant en- 
zyme was determined complexed with the protein inhibitors 
APPl and/or BPTl, which are analogs of the substrate 
Michaelis complexes possessing Arg and Lys, respectively, at the 
PI position (Perona et al., 1993b). This allows for the direct 
comparison of substrate-like interactions of Arg and Lys side 
chains in the binding pockets of wild-type and mutant enzymes. 
Trypsin D189G/G226D is equally reduced (10-fold) in binding 
affinity toward Lys and Arg substrates and is sharply lowered 
(10^-fold) in k^.f^^ toward Arg. The crystallographic analysis 
showed that Asp 226 is partially sequestered from substrate by 
intramolecular interactions made with Ser 190 and Tyr 228, such 
that only a single carboxylate oxygen is available for substrate 
binding. Further, comparisons with the wild-type interactions 
indicated no correlation between the binding affinities of either 
Lys and Arg substrates and the number of direct contacts made 
with Asp 226. Therefore, it appears that substrate binding af- 
finity to trypsin depends upon the accessibility of the negative 
charge to substrate and not upon the formation of direct inter- 
actions. This observation implies that direct electrostatic hydro- 
gen bonding interactions between the substrate Lys/Arg and the 
enzyme carboxylate group do not significantly improve the free 
energy of binding relative to indirect water-mediated interactions 
(Perona et al,, 1993c), 



The crystal structure of trypsin DI89S revealed that an ace- 
tate ion from the crystallization buffer was trapped at the base 
of the binding pocket, such that its carboxylate group was par- 
tially oriented toward substrate (Perona et al., 1994; Fig. 10). 
Exogenously added acetate provided up to 300-fold rate en- 
hancements to trypsin D189S toward Arg- and Lys-containing 
substrates, but catalytic activity remained diminished relative to 
wild-type trypsin. This structure thus provides a second exam- 
ple showing that optimal placement of the negative charge in the 
binding pocket is critical to catalysis. Significantly, the diminished 
activities of both trypsins D189G/G226D and D189S/acetate are 
reflected in kcat as well as /T^. Measurement of activities toward 
analogous ester as well as amide substrates by these enzymes al- 
lows calculation of the mechanistic parameters /C,, k^, and k^ 
(Zerner & Bender, 1964; Fig. 2C), removing the ambiguity in in- 
terpretation of the steady-state Michaelis-Menten parameters. 
This analysis shows that the role of the Asp 1 89 carboxylate in 
trypsin is twofold: it provides both tight binding affinity as 
well as high acylation rate k^ (Perona et al., 1994). Therefore, 
the precise location of the negatively charged group within the 
trypsin SI site is critical to positioning the scissile bond in cata- 
lytic register with Ser 195 and His 57. 

Analysis of the kinetic properties of the 20 variants isolated 
from the genetic selection corroborates these hypotheses regard- 
ing the operation of the SI site. Although the binding constants 
of the enzymes vary widely, it is significant that relative affini- 
ties for Lys versus Arg substrates remain very similar (Perona 
et al., 1993a). The negatively charged carboxylate in these mu- 
tants is provided by either Asp or Glu at positions 189 or 190, 
and the partner to this residue is 1 of 10 different amino acids. 
Thus, it is very unlikely that equal reductions in affinity toward 
Lys versus Arg substrates can in most cases be attributed to 
an equal loss of hydrogen bonding or electrostatic interactions. 
Instead, binding affinity is likely to be better correlated with ac- 
cessibility of the negative charge to substrate; barring substrate- 
induced conformational changes, this accessibility will be the 
same for both Lys and Arg substrates. Binding affinities are then 
predicted to be weaker when the carboxylate is partially seques- 
tered from substrates, as seen in the structures of the mutants 
D189G/G226D and D189S/acetale. Crystal structures of addi- 
tional variants from the selection pool should enable a quanti- 
tative correlation between binding affinity and accessibility of 
the negative charge. These experiments also explain the ratio- 
nale for conservation of the Asp at position 189 in the vast ma- 
jority of trypsin homologs, because other locations result in 
partial sequestration of the negative charge. 

In a second set of experiments, site-directed mutagenesis has 
been used to convert trypsin into a chymotrypsin-like protease 
possessing high selectivity for cleavage adjacent to large hydro- 
phobic amino acids (Hedstrom et al., 1992, 1994a, 1994b). The 
structures of the SI pockets of the two enzymes are very similar 
(Figs. 6, 1 1 A), so it was expected that specificity modification 
might be straightforward as in subtilisin and a-Iytic protease. 
However, when the amino acids directly in contact with sub- 
strate were exchanged into trypsin, the resulting variants DI89S 
and D189S/Q192M/n38T/T218 failed to exhibit significant im- 
provement in cleavage of Pl-Phe amide substrates (Graf et al., 
1988; Hedstrom et al., 1992; Table 3). Poor efficiency was also 
shown toward trypsin substrates, as expected because the pocket 
lacks a negative charge. The crystal structure of trypsin D189S 
showed that only very local structural changes were introduced 



352 

A 



J.J. Perona and C.S. Craik 



B 




Mg. 1 i. A: Comparison of ihe Si sites of 
irypsin and chyniotrypsin. Van der Waals 
surfaces of each enzyme are shown with 
the posi(ton-lS9 amino acid (Asp in tr\'p- 
sin; Scr in chymotr>'psin) indicated in red. 
In yellow is the conserved Ser 190. which 
is orienied into the SI pocket in trypsin 
but rotates out in chymoirypsin. The in- 
serted Thr 218 in chymotrypsin is shown 
in green. Two other amino acids directly 
in or adjacent to the SI site are lie 138 
(Thr 13S in chymoirypsin). and Gin 192 
(Met 192 in chymotrypsin). Although a 
high degree of structural similarity is clear, 
exchange of these four amino acids fails 
lo transfer chymoiryptic specificity to 
irypsin. B: Structural determinants re- 
quired to c-xchangc substrate specificity 
include two adjacent surface loops (loop 
I and loop 2) and an amino acid (Tyr 172 
in trypsin) in a third adjacent segment 
(loop 3). None of these .structural ele- 
ments directly contact substrate (shown 
at top in ill in green lines). Trypsin is 
shown in red and chymoirypsin in green. 




as a consequence of the substitution; the binding pocket main- 
tains a trypsin-Iike conformation (Perona ei a!., 1994). This 
conHrms thai the small struct ura! differences between trypsin , 
and chyrnotrypsin in the SI site (Fig. 1 1 A) must be critical deter- 
minants of the specificity and must rely on more distant parts of 
the structure for maintenance of their particular conformations. 

Exchange of the two surface loops. loop 1 and loop 2 (Fig. I IB), 
resulted in the hybrid enzyme Tr—ChlSl +L1 + L2j, which ex- 
hibited an acylaiion rate constant kz equal to that of chymo- 
trypsin toward peptidyl TM-Phe amide substrates (Hcdstrom 
et al., 1992; Table 3). However, the enzyme was still reduced by 
nearly lO '-fold in k,.,„/K,„ because of a very weak substrate 
binding affinity. The mechanistic kinetic parameters A\, As, 
and /l'i were calculated for cleavage of both single-residue and 
peptidyl Pl-Phe amide substrates for the enzymes trypsin, chy- 
moirypsin. DI89S and Tr-»Ch|SH-LH- L21. These data showed 
that, like chymotrypsin, the hybrid trypsin was able to use the 



binding energy obtained by occupancy of the S2-S4 enzyme sites 
to increase the acylation rate. They also demonstrated that, 
among this series of enzymes, the keyniechanistic step that de- 
termines substrate specificity is not binding affinity, but instead 
the chemical step of acylation (Hedstrom et al., 1992, 1994a). 

Further mutations were sought to improve catalytic efficiency 
toward chymotryptic substrates by increasing binding affinity. 
The additional mutation Y172W in a third adjacent surface loop 
(Fig. I IB) produced the hybrid enzyme Tr~^Ch[SH-Ll -f L2-t- 
YI72W1, which improves the activity of Tr-*Chl,SI -i-Ll -M..21 by 
20-50- fold, creating an enzyme with up to 15% of the activity 
of chymotrypsin (Hedstrom et al., 1994b; Table 3). The im- 
provement toward a tetrapcptide Pl-Phc amide substrate is 
manifested almost entirely in tighter binding affinity. The rel- 
ative catalytic efficiencies measured toward Trp, Tyr, Phe, and 
Leu PI -amide substrates also more closely mimic chymotryp- 
sin (Hedstrom et al., 1994b). 



Siibsiraie specificity in serine proteases 



353 




Table 3. Conversion of trypsin to chymotryptic specificity^ 





(M) 




A-3 is-') 


Trypsin 


>0.25 


>0.2 


36 


DI89S 


0.015 


0.29 


33 


Tr-*ChlSI+LH-L2] 


o.on 


20 


37 


Tr-.Ch(S 1 + L 1 + L2+ Y 1 72\V] 


5.0 X iO-^ 


41 


63 


Chymoirypsin 


1.5 X 10-' 


850 


52 



Substrate: 5»c-,'Ma-AIa-Pro-Phe-/7NA, 



The siructural basis for the acitviiics of the two hybrid tryp- 
sins was elucidated by determination of their crystal structures 
coinple.xcd with the iransition-siate inaciivator ^wr-Ala-.Ala-Pro- 
Phe-chloromethyl ketone {.s;/t*-AAPF-CMK; Perona et al.. 
1995). Loop 2 orTr-*ChISl + Ll + L2I adopts a conformation 
identical to thai which it possesses in chymotrypsin. However, 
amino acids at posiiions 185-187 within Loop 1 are disordered. 
The structure orTr-*Ch(S 1 + L I + L2-h Y 1 72\V| showed improved 
order in Loop 1 and a rearrangement of solvent structure and 
Ser 217 side-chain orientation, each of which more closely mim- 
icked the structure of chymotrypsin. No other changes were 
present between the two hybrid enzymes, implicating these struc- 
tural elements as important determinants of A", in chymoirypsin. 

Both hybrid enzymes possess wild-type chymotrypsin-like ac- 
ylation rates A'; toward peptidyl Pl-Phe amide substrates, and 
each utilizes binding of the extended peptide (substrate sites P2- 
P4) to increase this rate. In fact, the 10''-fo!d specificity of chy- 
moirypsin relative to trypsin for cleavage at Pl-Phe is manifested 
solely in e.viended peptidyl substrates; only a 10~-fold level of 
discrimination exists for single-residue substrates (Hedstrom 
ei a I., 1994b). in all available crystal structures of the enzymes, 
including those of the trypsin hybrids, two hydrogen bonds are 
formed in an antiparallel ^-sheet fashion with the backbone am- 
ide group of Gly 216 (Perona ci al., 1995). The backbone con- 
formation at Gly 216 differs between trypsin and chymotrypsin; 
the hybrid enzymes adopt a chymotrypsin- like conCormation. 
This suggests that the Gly 216 backbone is a critical specificity 
determinant because it directly binds a portion of substrate re- 
sponsible for a I0**-fold preference at position PI. The mechanism 
by which Gly 216 functions is likely to be through promoting ac- 
curate scissile bond positioning (Perona ei al., 1995). Because 
Asp 189 of trypsin also plays a critical role in this function, ii ap- 
pears that the identity of the amino acid at position 189, and the 
backbone conformation at Gly 216, must be matched in order to 
permit efUcient and specific catalysis by trypsin and chymotrypsin. 

Structural comparisons among a number of the chymotrypsin- 
like proteases, including both PPE and KNE, showed a strik- 
ing correlation between the PI -site specificity and ilie backbone 
conformation at position 216 (Perona et al.. 1995). Three struc- 
tural clas.ses were delineated, which correspond to trypsin, chymo- 
trypsin, and elastase-like enzymes (Fig. 12). The role of Gly 216 
in promoting accurate substrate positioning may thus be a fea- 
ture of many enzymes in the family. In this context it is relevant 
to note thai the kinetic phenomenon observed for both trypsin 
(Perona et al., 1993c) and chymoirypsin (Hedstrom et al., 
1992), namely that subsiic occupancy causes large increases in 
the rates of the chemical steps of catalysis, is also common to 
other I rypsin-like enzymes including PPE (Thompson & Bloui, 



Fig. 12. A correlation is obser\'cd between the backbone conformaiion 
of residue 216 and the Si site substrate preference among all of the 
trypsin-, chymoir\'psin-, and elastase-like proteases of known structure. 
Shown is a superposition of seven mammalian serine proteases (color- 
coded), indicating the structure at this position that is most easily visa-' 
alizcd in the orientation of the carbonyl oxygen atom. Specific 
trypsin-like, chymotrypsin-likc. and elastase-like backbone angles 
are observed. Residue 216 binds the P3 position of the substrate in all 
the enzymes. Extended peptide binding to residue 216 is required both 
lo achetvc full catalytic poiency as well as to obtain a maximal level of 
PI -site discrimination among alicrnaiive amino acids. Conversion of the 
substrate spccificiiy of trypsin to thai of chymotrypsin requires reori- 
entation of Gly 216 lo a chymoiryp.sin-Iike conformation. Thus, the 
posilion-216 backbone is .strongly .suggested as an essential specificity 
deierminani in the mammalian trypstn-likc proteases. 



1970), HNE (Stein et al., 1987), SGPA (Bauer et al., 1976; 
Bauer, 1978), SGPB (Bauer, 1978), and a-lytic protease (Bauer 
et al., 1981; also see above). The significance of the recent ki- 
netic analysis (Hedstrom et a!., 1992) is that it shows that both 
the catalytic rate toward cognate substrates, as well as the de- 
gree of specificity at the Pl-position, are dependent on the fill- 
ing of subsites, which themselves exhibit little amino acid 
preference. 

The crystal .structures of the trypsin hybrids also addicss an- 
other fundamental question in enzyme catalysis: the role of the 
global protein structure. Distal structural elements such as Trp 172 
and loops I and 2 play a key role in specifying the conforma- 
tion of residues that do interact directly with substrate. Thus, 
their role is not solely to provide an inert platform that stabi- 
lizes the amino acids that interact directly with substrate. These 
elements of the global architecture play an active role in deter- 
mining substrate specificity as well, which should thus be viewed 
as a more distributed property of the protein fold. An alterna- 
tive mechanism for (he way in which global protein folds may 
influence specificity is by modulating the degree of backbone 
flexibility of the SI site, as exemplified in the a-lyiic protease 
studies (Bone et al.. 1991). 

Exchange of the SI -site residues of HNE into trypsin also fails 
to convert the specificity of trypsin and results, as in the case 
of the mutants D1S9S and Dl 89S/QI92M/1 138T/T218, in a 
poor nonspecific protease (.LJ. Perona & C.S. Craik, unpubl. 



354 



//- Perona and C.S. Craik 



obs.). Similarly, introduction of Lys, Arg, or His residues into 
the trypsin SI site has failed lo generate specificity toward Asp 
or Clu residues (Grafet al., 1987; Willett et a!., 1995: J.J. Pe- 
rona & C.S. Craik, unpubl. obs.). .A better mutational strategy 
for specificity modification in trypsin may be the construction 
of libraries that instead span the distal structural elements. When 
coupled to strategies such as the genetic selection (lEvnin et al.» 
1990; Perona eial., 1993a) or phage display (Corey et al., 1993; 
Fig. 4C) systems, it should be possible to search a large num- 
ber of different structures for those providing altered specificity. 

Surface loops determine subsite specificity 
in the trypsin -class enzymes 

We have seen that the best-studied members of the chymotrypsin- 
like class of serine proteases each manifest primary specificity 
at the PI site directly adjacent to the cleaved bond. However, 
there arc also several enzymes in the class that possess signifi- 
cant specificity toward substrate residues at a greater distance 
in both the N- and C-terminal directions. Sequence alignments 
of these enzymes reveal that a number of surface loops flank- 
ing the catalytic residues are very likely to play crucial roles in 
determining this extended recognition selectivity (Fig. 13). 

One enzyme manifesting an extended subsite specificity that 
is also of known tertiary structure is RMCPIl (Woodbury et al., 
1978a, 1978b), a member of a homologous subclass of trypsin- 
like serine proteases expressed also in other granulocytes (Sal- 
vcscn et al., 19S7) as well as in lymphocytes (Lobe et al,, 1986). 
RMCPIl and the related RMCPI (which possess 73*^i^o amino 
acid sequence identity; LeTrong et al., 1987b) each manifest a 




Fig. 13. Structure of trypsin, highlighting the positions of four surface 
loops tloops A, B. C, D) involved in determining subsite preferences 
among a number of the enzymes in the family. The location of these 
loops relative to the catalytic machinery and binding cleft may be con- 
trasted with the position of the three loops (loops 1,2,3) thai combine 
to influence specificity in the SI site. A polypeptide substrate chain is 
shown in green and the catalytic triad is in yellow. Ii is clear that loop 
C is positioned to interact wiih substrate residues N-tenninal to the scis- 
sile bond, whereas loops A and D are positioned to interact with the 
C-tcrminal amino acids on the leaving -group side of the ivcissilc bond. 



chymoirypsin-like primary substrate specificity but also exhibit 
preferences for hydrophobic amino acids in positions P2 and P3 
(Yo.shidaei al., 1980; Powers et al.. 1985). RMCPI also has been 
shown to prefer hydrophobic residues at position PI' in poly- 
peptide subst rates, although t he extent of the selectivity has not 
been established quantitatively (LeTrong et al., l9S7a). 

The crystal structure of uncomplexcd RMCPIl has been de- 
termined at a resolution of 1 .9 A (Remington et al., 1988). This 
structure suggests that the enhanced substrate selectivity of the 
homologous RMCPI at the PT position is likely to be provided 
by the presence of a large cleft not found in the other chymo- 
trypsin-like proteases of known structure. The cleft is formed 
as a consequence of an unusual conformation adopted by two 
surface loops that lie adjacent to the catalytic residues (Reming- 
ton et al., 1988). The loops comprise residues 34-41 (loop A) 
and 59-64 (loop B) and are positioned such as to be capable of 
interacting directly with substrate residues C-ierminal to the 
scissile bond (Fig. 13). Modeling of a substrate complex with 
R.N'lCPll suggests that loop A is most likely to directly contact 
the Pr-P2' substrate sites, whereas loop B plays a structural role 
in helping lo form the cleft. 

The subclass of .serine proteases to which RMCPIl belongs 
is distinguished by the ab.sence of the otherwise well-conserved 
disulfide bond linking residues 191 and 220 (LeTrong et al., 
1987b). In the other enzymes, this disulfide bridges the two walls 
of the SI site and likely provides a degree of structural rigidity 
to the cavity (Fig. 7). RMCPIl possesses a Phe residue at posi- 
tion 191 and a shortened loop L2 (residues 217-225) relative to 
chymotrypsin; each of these features is conserved within the sub- 
class (LeTrong et ai., 19S7b). Modeling of a tripeptide substrate 
possessing Phe at position P3 shows that the aromatic ring is 
readily sandwiched beiweeti the side chains of Met 192 and 
Pro 221 A and also makes van der Waals interactions with Phe 191 
(Rctnington ct al., 19S8). This small hydrophobic pocket is ab- 
sent in chymotrypsin owing to the presence of the Cys 191- 
Cys 220 disulfide bond. Thus, the crystal structure provides a 
plausible rationale explaining the 100-fold preference of RMCPI 
and RMCPIl for Phe relative to Gly at position P3 (Yoshida 
et al., 1980). 

A second example of extended binding site specificity is pro- 
vided by the enzyme cnrcropcptidasc (en rcro kinase), which func- 
tions in vivo to cleave the zymogen trypsinogen at position He 16, 
generating the new N-terminus required for trypsin activity (re- 
viewed in Huber Sl Bode, 1978). This enzyme hydrolyzes the 
peptide bond directly C-terminal to the sequence (Asp)4Lys in 
trypsinogen, and consequently possesses a irypsiu-like specific- 
ity toward positively cliargcd amino acids in the P I position. The 
bovine and porcine enzymes exist as glycosylated disulfide-linked 
hetcrodimers comprising a heavy chain of 115 kDa and a light 
chain of 43 kDa (Magee et al., 1977; LaVallie et al., 1993). 
Chemical modification studies established that the catalytic ac- 
tivity and specificity of the enzyme resides in the light chain 
(Light & Fonseca, 1984). Most recently, cloning and expression 
of the light chain has revealed it to po.ssess 35-40*^7o sequence 
identity to the trypsin-like class of serine proteases (LaVallie 
et al., 1993). This study also demonstrated that this subunit pos- 
sesses full activity toward the fluorogenic peptide substrate 
(AspXiLys-^-naphthylamide. The presence of the heavy chain, 
however, endows the holoenzyme with 1 00- fold greater catalytic 
efficiency toward the cognate trypsinogen substrate (LaVallie 
et al.. 1993). 



Substrate specificity in serine proteases 



355 



Native enteropeptidasc is capable of cleaving the (Asp)4Lys 
sequence in irypsinogen with a catalytic efficiency roughly lO'*- 
fold greater than trypsin (Maroux et al., 1971). Mapping the se- 
quence of the light chain of the enzyme onto the structure of 
trypsin indicates that the peptide Lys 96-Arg 97-Arg 98-Lys 99 
(KRRK) is well positioned to play a direct role in interacting with 
the negatively charged aspartates occupying positions P2-P5 
(LaVallie et al.. 1993). This peptide comprises a portion of a sur- 
face loop located adjacent to Asp 102 (loop C; Fig. 13), on the 
opposing side of the catalytic triad relative to the loops A and 
B that form the cleft important to PI' recognition by RMCPI. 

The kinetic basis for the improved specificity of enteropep- 
tidase relative to trypsin for recognition of the (Asp)4Lys se- 
quence is not yet known. By analogy with the known operation 
of the pancreatic proteases, it would be predicted that the spec- 
ificity arises at least partly from the ability of enteropeptidasc 
to selectively accelerate the acylation rate of (Asp)4Lys-/3- 
naphthylamide relative to other peptidyl or to single-residue sub- 
strates. It is tempting to speculate that enteropeptidasc may use 
a distinct structural mechanism, involving specific interactions 
with the aspartates, to convert substrate binding energy into a 
high catalytic rate. Inspection of the sequence alignment with 
trypsin reveals further differences at positions 215-219 at the lip 
of the SI site, as well as the insertion of a residue in loop L3 
(Fig. 13), each of which may be of importance to precise orien- 
tation of the (Asp)4Lys substrate. Additionally, enteropeptidasc 
possesses a striking 10-residue insertion between residues 58 and 
59, in the surface loop B that lies directly behind the KRRK se- 
quence of loop C (LaVallie et al., 1993; Fig. 13). Ahhough loops 
B and C do not contact each other in trypsin, the much larger 
loop B in enteropeptidasc would be capable of making inter- 
actions conceivably of importance to maintaining correct ori- 
entation of the KRRK residues. 

A third example of the importance of surface loops in these 
enzymes relates to the inhibition of the trypsin-like tissue plas- 
minogen activator by plasminogen activator inhibitor I (Ny et al., 
1986). The interaction between TPA and PAl-1 is of importance 
in the regulation of the cascade of activities involved in blood 
clotting (Davie et al.. 1991). Surface loop A of TPA (Fig. 13) 
possesses a high density of positively charged amino acids (res- 
idues Lys 296-His 297-Arg 298-Arg 299) that have been shown 
to be critical to its interaction with a negatively charged region 
of PAl-1 (Madison et al., 1990). This was confirmed in an el- 
egant experiment in which loop A in the homologous enzyme 
thrombin was replaced with that of TPA, endowing PAI-1 sus- 
ceptibility onto thrombin (Horrevoets et al., 1993). Thus, both 
the extended substrate specificity as well as the specificity of in- 
teraction with physiologically important inhibitors can arise 
from contacts with the same surface loops. 

An important activity of crab collagenase is the ability to 
cleave native triple-helical collagen, a property not exhibited by 
the canonical pancreatic proteases (Eisen et al., 1973; Tsu et al., 
1994). Cleavage occurs within domains of the triple-helical sub- 
strate that are relaxed from the strict Gly-Pro-Xaa repetitive se- 
quence. Detailed examination of the cleavage sites by protein 
sequencing has shown that proteolysis of collagen occurs at po- 
sitions that mirror the PI -site selectivity (Tsu et al,, 1994). Se- 
quence alignments of a range of serine coUagenases from diverse 
species fails to elucidate clear amino acid similarities that might 
be correlated to the triple-helical specificity (Sinha et al.. 1987; 
Sellos & Van Wormhoudt, 1992). However, the crystal structure 



of collagenase complexed with the dimeric protein inhibitor eco- 
tin has allowed construction of a model of collagen interacting 
with the enzyme (J.J. Perona, C.A. Tsu. R.J. Fletterick, & C.S. 
Craik, in prep.). Several surface loops, including loops A and D 
(Fig, 1 3), may play crucial roles in recognition of the triple helix. 

Recently, a novel assay has been introduced that provides the 
possibility of assaying relative preferences at p>ositions on the 
leaving-group side of the scissile bond (Schellenberger et al.. 
1993). In an initial study, the ST subsite specificities of trypsin 
and chymotrypsin from cow and rat were determined by moni- 
toring the reverse reaction of peptide hydrolysis. Acyl transfer 
was measured to a mixture of 21 peptide nucleophiles of the gen- 
eral structure H-Xaa-Ala- Ala-Ala- Ala-NHj; the decrease in 
concentration of each nucleophile was monitored by HPLC and 
represents a measure of the ability of that substrate to compete 
with water for attack on the acyl enzyme. Chymotrypsin hydro- 
lyzes substrates possessing Arg and Lys at the substrate PI' po- 
sition roughly 10-fold more rapidly than does trypsin; this 
selectivity is attributable to the presence of additional negatively 
charged residues in two adjacent surface loops (see below). Tryp- 
sin exhibits a slight preference for hydrophobic amino acids at 
this position, relative to chymotrypsin. The data confirm the rel- 
ative lack of si;>ecificity of each enzyme at this position. Appli- 
cation of the methodology to crab collagenase showed a 30-fold 
preference for PI '-Arg residues; an Arg is also found on the 
C-terminal side of several of the collagen cleavage sites of the 
enzyme (Tsu et al., 1994). Data have also been obtained for spec- 
ificities at the subsites Sr-S3' for trypsin, chymotrypsin, a-lytic 
protease, and the cercarial protease from Schistosoma mansoni; 
in these cases, relative cleavage rates varied by factors of up to 
lO^-fold (Schellenberger et al., 1994). 

It is clear from the many known structures of chymotrypsin- 
like serine proteases that loop C is invariably positioned to di- 
rectly contact the extended substrate on the N-terminal side, 
whereas loops A and D interact on the leaving group side. By 
contrast, loop B appears less likely to be involved in direct con- 
tacts but instead is positioned to stabilize the primary inter- 
actions made by the more forward loops (Fig. 13). Depending 
on the size and conformation of this loop in different enzymes, 
it might in principle be able to stabilize either loop A or C. A 
final example of specificity modification in this class involves 
loop D: introduction of histidine residues at the N- and C-terminal 
ends of this loop confers metal-dependent specificity for histi- 
dine at the P2' substrate position onto rat trypsin (Wiilett et al., 
1995). In general, because subsite specificity of chymotrypsin- 
like proteases is modulated by surface loops rather than by core 
secondary structure elements, the prospects for engineering novel 
specificities, and for the development of "restriction proteases" 
that might recognize substrate sites from P5 to P2', seem hopeful. 

Conclusions and future directions 

One of the questions addressed in these studies is the role of 
water molecules in mediating enzyme-ligand interactions. Crys- 
tal structures of wild-type and variant enzymes complexed with 
substrate analogs, together with the measurement of affinity 
constants, allows deduction of the importance of particular 
interactions. In the recognition of basic Lys and Arg substrate 
side chains by Asp 189 of trypsin, the conclusion is that a water- 
mediated interaction can provide a comparable free energy gain 
to a direct contact (Perona et al., 1993c). These studies have im- 



356 



J.J. Perona and C.S. Craik 



plications to understanding protein-nucleic acid interactions. 
For example, the crystal structures of the trp repressor-operator 
complex, and of the uncomplexed operator DNA, suggest a 
crucial role for water-mediated interactions in providing DNA 
sequence specificity because no direct contacts with base func- 
tional groups are observed (Otwinowski et al., 1988; Shakked 
et al., 1994). Although a second-site reversion analysis of the 
operator DNA further implied a key role for the intervening wa- 
ters, it was clear that a structural analysis of the modified com- 
plexes is still required (Joachimiak et al., 1994). Such an analysis 
of the charge-charge interactions in the trypsin SI site shows 
more definitively that a specificity-determining role for solvent 
is in principle possible. A similar study of the trp repressor and 
of other systems is warranted, to address the extent to which this 
phenomenon may be dependent on the local structural context. 

Another fundamental question concerns the design of enzyme 
structures to provide different degrees of flexibility to the sub- 
strate binding site. The comparison of trypsin and a-lytic pro- 
tease offers an excellent opportunity to address this issue. Thus 
far, it appears from both kinetic and structural analysis of mu- 
tants that the trypsin pocket may be considerably more rigid. 
However, the two structures are homologous so that the degree 
of difference in the surrounding scaffolds is relatively small. 
Thus, the problem may be manageable: which specific inter- 
actions bridging the primary and secondary shell residues arc 
most critical for determining flexibility? Are residues located 
even more distant also important? An excellent test of our un- 
derstanding would be the construction of a trypsin variant with 
chymotrypiic specificity, which possessed far fewer than the 15 
alterations of Tr^ChlS I + L 1 -h L2+ Y 1 72W] . I f indeed the con- 
formation of Gly 216 is crucial to PI -site specificity, then the 
problem reduces to adding certain key mutations to DI89S such 
that Gly 216 can reorient upon substrate binding, as it is ob- 
served to do in a-lytic protease (Bone et al., 1991; Fig. 9). A 
deeper understanding of flexibility would have clear application 
to protein folding and stability as well (Rose & Creamer, 1994). 

The degree to which a substrate binding cleft is inherently de- 
formable may be an important parameter governing the ease 
with which specificity modification can be effected. Prior to the 
advent of site-directed mutagenesis, it appeared possible that 
even conservative amino acid changes might cause highly dele- 
terious long-range structural effects. We now know that most 
substitutions are absorbed locally and that the majority of pro- 
tein structural contexts therefore have some ability to deform. 
Protein folding and stability often are not greatly perturbed even 
by very challenging mutations. The sensitivity of enzyme activ- 
ity to precise substrate positioning might alternatively suggest 
that mutation of the binding site would usually result in low cat- 
alytic activity. However, this appears not to be the case: among 
the well-studied binding pockets considered here, the subtilisin 
SI and S4 sites, as well as the ot-lytic protease SI site, each are 
readily modified to alter specificity with only limited local sub- 
stitutions. Only the trypsin SI site requires extensive nonlocal 
changes. 

Another reason for the difficulty in modifying trypsin sub- 
strate specificity could be that the charge-charge interactions 
present in a trypsin transition-state complex require a precise 
electrostatic environment not readily altered (Hwang & Warshel, 
1988). The electrostatic potential is presently the least under- 
stood force shaping enzyme structure and activity; it is also the 
only one that operates over large distances. Considerable efforts 



are underway to improve empirical forcefields, so that catalytic 
free energies can be accurately estimated directly from structural 
models. Serine proteases are a favored system in these studies 
owing to the large database of structure-activity information 
(Bash et al., 1987; Rao et al., 1987; Caldwell et al.. 1991; Mizu- 
shima et al., 1991; Wilson et al.. 1991). Further mutational anal- 
ysis will thus also be invaluable in providing a testbed for new 
algorithms. Greater insight into the connection between struc- 
ture and energetics will lead to much better predictive ability re- 
garding the consequences of mutation. This improved insight, 
together with the innovative technologies for the generation and 
screening of large libraries, may soon result in the creation of 
new, highly efficient proteases possessing a broad range of use- 
ful properties. 

Acknowledgments 

We thank Prof. R. Fletterick for helpful discussions. This work was sup- 
ported by NSF grams MCB-9219806 and BCS-91 19237 to C.S.C. and 
NIH postdoctoral NRSA award CM 13818-03 to J.J. P. 



References 

Abrahmscn L, Tom J, Burnicr J, Butcher KA, Kossiakoff A, Wells JA, 1991. 
Engineering subtilisin and its substrates for efficient ligation of peptide 
bonds in aqueous solution. Biochemistry 50:4151-4159. 

Bachovchin WW, Roberts JD. 1978. Nitrogen- 15 nuclear magnetic resonance 
speciroscopy. The state of histidine in the catalytic triad of a-lytic pro- 
tease. Implications for the charge-relay mechanism of peptide-bond cleav- 
age by serine proteases. J Am Chem Soc 700:8041-8047. 

Bash PA, Singh UC, Langridge R, Kollman PA. 1987. Free energy calcula- 
tions by computer simulation. Science 236:564-566. 

Bauer CA. 1978. Active centers of Streptomyces griseus protease 1 , Strep- 
fomyces griseus protease 3, and a-chymotrypsin: Enzyme-substrate in- 
teractions. Biochemistry 77:375-380. 

Bauer CA. Brayer GD, Sielecki AR, James MNG. 1981. Active site of a- 
lytic protease. Enzyme-substrate interactions. Eur J Biochem 720:289- 
294. 

Bauer CA. Thompson RC, Blout ER, 1976. The active centers of Sfrepio- 
myces griseus protease 3 and a-chymotrypsin: Enzyme-substrate inter- 
actions remote from the scissile bond. Biochemistry /i: 1291-1295. 

Bazan JF, Fletterick RJ. 1990. Structural and catalytic models of irypsin- 
like viral proteases. Semin Virol /;31 1-322. 

Bech LM, Sorensen SB, Breddam K. 1992. Mutational replacements in sub- 
tilisin 309. Val 104 has a modulating effect on the P4 substrate prefer- 
ence. Eur J Biochem 209:869-874. 

Bech LM, Sorensen SB, Breddam K. 1993. Significance of hydrophobic 
S4-P4 interactions in subtilisin 309 from Bacillus lerttus. Biochemistry 32: 
2845-2852. 

Bender ML, Killheffer JV. 1973. Chymotrypsins, CRC Crit Rev Biochem 
7:149-199. 

Betzel C, Klupsch S, Papendorf G. Hastrup S, Branner S, Wilson KS. 1992. 
Crystal structure of the alkaline proteinase savin ase from Bacillus lent us 
at 1,4 A resolution. J Mol Bid 223:427-445. 

Betzel C, Pal GP, Saengcr W. 1988. Three-dimensional structure of protein- 
ase K at 0.15-nm resolution. Eur J Biochem 775: 155-171. 

Betzel C, Singh TP, Visanji M, Fittkau S, Sacnger W, Wilson KS. 1993. Struc- 
ture of the complex of proteinase K with a substrate analogue hexapep- 
tidc inhibitor at 2.2-A resolution. J Biol Chem 2<SS: 15854- 1 5858. 

Blow DM. 1976. Structure and mechanism of chymoirypsin. Acv Chem Res 
9:145-152. 

Blow DM, Birktofl J J. Hartley BS. 1969. Role of a buried acid group in the 
mechanism of action of chymotrypsin. Nature 22l :i31-iM). 

Bode W, Chen Z, Bands K, Kuizbach C, Schmidt-Kaslner G, Bartunik H. 
1983. Refined 2 A X-ray crystal structure of porcine pancreatic kallikrein 
A, a specific trypsin-likc serine proteinase — Crystallizaiion, structure de- 
termination, crystallographic rermement, structure and its comparison 
with bovine trypsin. J Mol Biol 7tf4:237-282. 

Bode W, Mayr I. Baumann U. Huber R. Stone SR. Hofsieenge J. 1989a. 
The refined 1.9 A crystal structure of human or-lhrombin: Interaction 



Substrate specificity in serine proteases 




357 



with D-Phc-Pro-Arg chloromcthyl ketone and significance of the Tyr- 
Pro-Pro-Trp insertion segment. EMBO J 5:3467-3475, 

Bode W, Meyer E Jr, Powers JC. 1989b. Human leukocyte and porcine pan- 
creatic elastase: X-ray crystal structures, mechanism, substrate specific- 
ity, and mechanism-based inhibitors. -B/ocAemir/ry 25:1951-1963. 

Bode W, Papamokos E. Musil D. Secmuller U, Fritz H. 1986a. Refined 1.2 A 
crystal structure of the complex formed between subtili&in Carlsberg and 
the inhibitor egUn C. Molecular structure of eglin and its detailed inter- 
action with subtilisin. J?Af£0 7 J:813-818. 

Bode W. Walter J. Huber R, Wenzel HR, Tschesche H. 1984. The refined 
2.2 A X-ray crystal struaure of the ternary complex formed by bovine 
trypsinogen, valine-valine and the Arg 15 analogue of bovine pancre- 
atic trypsin inhibitor. Eur J Biochem 74^:185-190. 

Bode W. Wd AZ, Huber R, Meyer EF Jr, Travis J. Neumann S. 1986b. X-ray 
crystal structure of the complex of human leukocyte elastase (PMN elas- 
tase) and the third domain of the turkey ovomucoid inhibitor. EMBO 
7 5:2453-2458. 

Bone R, Agard DA. 1991. Mutational remodeling of enzyme specificity. 
Methods Enzymol 202:643-671 . 

Bone R, Frank D, Ketmer CA, Agard DA. 1989a. Structural analysis of spec- 
ificity: of-Lytic protease complexes with analogues of reaction intermedi- 
ates. Biochemistry 28:7600-1609. 

Bone R, Fujishige A, Kcttner CA, Agard DA. 1991. Structural basis for 
broad specificity in a-lytic protease mutants. Biochemistry S0:\O3BB- 
10398. 

Bone R, Shenvi AB, Kettner CA, Agard DA. 1987. Serine protease mecha- 
nism: Structure of an inhibitory complex of a4ytic protease and a tightly 
bound peptide boronic acid. Biochemistry 25:7609-7614. 

Bone R« Silen JL, Agard DA. 1989b. Structural plasticity broadens the spec- 
ificity of an engineered protease. Nature iJP:191-l95. 

Braxton S, Wells JA. 1991. The importance of a distal hydrogen bonding 
group in stabilizing the transition state in subtilisin BPN'. J Biol Chem 
256:11797-11800. 

Brayer CD, Delbaere LTJ, James MNG. 1979. Molecular structure of the 
a-lylic protease from Myxobacter 495 at 2.8 A resolution. J Mol Biot 

Brenner C, Fuller RS. 1992. Structural and enzymatic characterization of 
a purified, prohormone-proccssing enzyme: Secreted, soluble Kex2 pro- 
tease, Proc Nati Acad Sci USA 5P: 922-926. 

Bryan P, Pantoliano MW, Quill SG, Hsiao HY, Poulos T. 1986. Site-directed 
mutagenesis and the role of the oxyanion hole in subtilisin. Proc Nati 
Acad Sci USA 55:3743-3745. 

Caldwell JW, Agard DA. Kollman PA. 1991. Free energy calculations on 
binding and catalysis by a-lytic protease: The role of substrate size In 
the PI pocket. Proteins Struct Fund Genet 70:140-148. 

Caputo A, James MNG. Powers JC. Hudig D. Bleackley RC. 1994. Con- 
version of the substrate specificity of mouse proteinase granzyme B. Struct 
Biol 7:364-367. 

Carter P, Abrahmsen L, Wells JA. 1991. Probing the mechanism and im- 
proving the rate of substrate-assisted catalysis in subtilisin BPN'. Bio- 
chemistry iO :6 1 42-6 1 48 . 

Carter P, Nilsson B, Burnier JP, Burdick D, Wells JA. 1989. Engineering 
subtilisin BPN' for site-specific proteolysis. Proteins Struct Funct Ge- 
net 6:240-248. 

Carter P, Wells JA. 1987. Engineering enzyme specificity by "substrate- 
assisted catalysis.'* Science 237:394-399. 

Carter P. Wells JA. 1988. Dissecting the catalytic triad of a serine protease. 
7Vfl/u/-e 352:564-568. 

Carter P, Wells J A. 1990. Functional interactions among catalytic residues 
in subtilisin BPN'. Proteins Struct Funct Genet 7:335-342. 

Chasan R, Anderson KV. 1989. The role of Easter, an apparent serine pro- 
tease, in organizing the dorsal-ventral pattern of the Drosophila embryo. 
Ceil 55:391-400. 

Corey DR, Craik CS. 1992. An investigation into the minimum requirements 
for peptide hydrolysis by mutation of the catalytic triad of trypsin. J Am 
Chem Soc ; 74: 1784- 1790. 

Corey DR, McGrath ME, Vasqucz JR, Fletterick RJ, Craik CS. 1992. An 
alternate geometry for the catalytic triad of serine proteases. J Am Chem 
Soc 774:4906-4907. 

Corey DR, Shiau AK, Yang Q. Janowski B, Craik CS. 1993. Trypsin dis- 
play on the surface of bacteriophage. Gene 725:129-134. 

Craik CS, Largman C, Fletcher T, Roczniak S, Barr PJ, Fletterick RJ, Rutter 
WJ, 1985. Redesigning trypsin: Alteration of substrate specificity. Sci- 
ence 225 : 29 1 -297 . 

Craik CS. Roczniak S, Largman C, Rutter WJ. 1987. The catalytic role of 
the aaivc site aspartic acid in serine proteases. Science 257:909-913. 

Creemers JWM, Siezen RJ, Roebroek AJM, Ayoubi TAY. Huylebroeck D, 
Van de Ven WJM. 1993, Modulation of furin-mediated pro protein pro- 



cessing activity by site-directed mutagenesis. J Biol Chem 255:21826- 
21834. 

Dancer SJ, Garratt R. Saldanha J, Jhoti H, Evans R. 1990. The epidermo- 
lytic toxins are serine proteases. FEBS Lett 255:129-132. 

Davie EW. Fujikawa K, Kisiel W. 1991. The coagulation cascade: Initiation, 
maintenance and regulation, ^/oc/rem^y/ry 50:10363-10370. 

Delbaere LTJ. Brayer GD, James MNG. 1979. The 2.8 A resolution struc- 
ture of Streptomyces griseus protease B and its homology with o- 
chymotrypsin and Streptomyces griseus protease A. Can J Biochem 
57:135-144. 

Dixon GH. Go S. Neurath H. 1956. Peptides combined with '*C-diiso- 
propyl phosphoryl following degradation of '^C-DIP-trypsin with a- 
chymoirypsin. Biochim Biophys Acta 7P: 193-200. 

Drapeau GR. 1978. The primary structure of staphylococcal protease. Can 
J Biochem 55:534-544. 

Eder J. Rheinnecker M, Fershl AR. 1993. Hydrolysis of small peptide sub- 
strates parallels binding of chymotrypsin inhibitor 2 for mutants of sub- 
tilisin BPN'. FEBS Lett 555:349-352. 

Eisen AZ, Henderson KO, Jeffrey J J. Bradshaw RA. 1973. A coUagenolytic 
protease from the hepatopancrcas of the fiddler crab, Uca pugilator. Pu- 
rification and properties. Biochemistry 72:1814-1822. 

Epstein DM, Abelfes RH. 1992. Role of serine 214 and tyrosine 171, com- 
ponents of the S2 subsite of a-lytic protease, in catalysis. Biochemistry 
57:11216-11223. 

Estell DA, Graycar TP. Miller JV, Powers DB, Burnier JP, Ng PG, Wells 
J A. 1986. Probing steric and hydrophobic effects on enzyme-substrate 
interactions by protein engineering. Science 255:659-663. 

Evnin LB. Vasqucz JR, Craik CS. 1990. Substrate specificity of trypsin in- 
vestigated by using a genetic selection. Proc Natl Acad Sci USA 
57:6659-6663. 

Fujinaga M, Delbaere LTJ. Brayer GD, James MNG. 1985. Refined struc- 
ture of a-lytic protease at 1 ,7 A resolution. Analysis of hydrogen bond- 
ing and solvent structure. J Mol Biol 755:479-502. 

Fujinaga M, James MNG. 1987. Rat submaxillary gland serine protease, 
tonin — Structure solution and refinement at 1 ,8 A resolution. J Mol Biot 
7PJ:373-396. 

Fuller RS. Brake A, Thomer J. 1989, Yeast prohormone processing enzyme 
(KEX2 gene product) is a Ca^"^ -dependent serine protease. Proc Natl 
Acad Sci USA 55:1434-1438. 

Graf L. Craik CS. Pathy A, Roczniak S, Fletterick RJ, Rutter WJ. 1987. 
Selective alteration of substrate specificity by replacement of aspartic acid- 
189 with lysine in the binding pocket of trypsin. Biochemistry 25:2616- 
2623. 

Graf L, Jancso A, Szilagyi L, Hegyi G, Pinter K, Naray-Szabo G, Hepp J, 
Medzihradszky K, Rutter WJ. 1988. Electrostatic complementarity in the 
substrate binding pocket of trypsin. Proc Natl Acad Sci USA 55:4961- 
4965. 

Graham LD. Haggctt KD, Jennings PA, LcBrocque DS, Whittaker RG, 
Schober PA. 1993. Random mutagenesis of the substrate binding site of 
a serine protease can generate enzymes with increased activities and al- 
tered primary specificities. Biochemistry 32 :6250-62SS. 

Grant GA, Eisen AZ. 1980. Substrate specificity of the coUagenolytic ser- 
ine protease from Uca pugHator: Studies with nonce llageno us substrates. 
Biochemistry 7 P: 6089-6095 . 

Grant GA, Henderson KO, Eisen AZ, Bradshaw RA. 1980. Amino acid se- 
quence of a coUagenolytic protease from the hepatopancrcas of the fid- 
dler crab. Uca pugilator. Biochemistry /P:465 3-4659. 

Greer J. 1990. Comparative modeling methods: Application to the family 
of the mammalian serine proteases. Proteins Struct Funct Genet 7:317- 
334. 

Gren H. Breddam K. 1992. Interdependency of the binding sites in subti- 

lisin. B/oc/iemw//-^ 57:8967-8971. 
Gren H, Meldal M. Breddam K. 1992. Extensive comparison of the substrate 

preferences of two subtilisins as determined with peptide substrates which 

are based on the principle of intramolecular quenching. Biochemistry 

57:6011-6018. 

Gros P, Betzel C, Dauter Z, Wilson KS, Hoi WGJ. 1989. Molecular dynam- 
ics refinement of a thermitase-eglin-c complex at 1 .98 A resolution and 
comparison of two crystal forms that differ in calcium content. J Mol 
Biol 210:347-367. 

Guo J. Huang W, Scanlan TS. 1994. Kinetic and mechanistic characteriza- 
tion of a highly active hydrolyttc antibody: Evidence for the formation 
of an acyl intermediate. J Am Chem Soc 7/5:6062-6069. 

Harper JW, Cook RR, Roberts CJ. McLaughlin BJ, Powers JC. 1984. Ac- 
tive site mapping of the serine proteases human leukocyte elastase, ca- 
thepsin G, porcine pancreatic elastase, rat mast cell proteases I and II, 
bovine chymotrypsin Aa and 5. aureus protease V-8 using tri peptide thio- 
benzyl ester substrates. Biochemistry 25:2995-3002. 



# 



358 



Hcdstrom L. Fair- Jones S, Kettner CA, Ruitcr WJ. 1994a. Convening tryp- 
sin to chymotrypsin: Ground state binding does noi determine substrate 
spccificily- Biochemistry 55:8764-8769. 

Hedsirom L, Perona JJ, Rutler WJ. 1994b. Convening trypsin to chymo- 
trypsin: Residue 172 is a substrate specificity detcrminani. Biockemis- 
try 33 :E751-S163. 

Hcdstrom L, Szilagyi L, Rutter WJ. 1992. Convening trypsin to chymotryp- 
sin: The role of surface loops. Science 255 :12A9~\251. 

Higaki J, Evnin LB, Craik CS. 1989. Introduction of a cysteine protease ac- 
tive site into trypsin. Biochemistry 2fi:925 6-9263. 

Higaki JN, Haymorc BL. Chen S. Fleiterick R, Craik CS. 1990. Regulation 
of serine protease activity by an engineered metal switch. Biochemistry 
29:8582-6586. 

Hoirevoets AJG, Tans G, Smilde AE, van Zonnefcld AJ» Pannckoek H. 
1993. Thrombin-variablc region 1. Evidence for the dominant comribu 
tion of VR 1 of serine proteases to their interaction with plasminogen ac- 
tivator inhibitor I. J Biot Chem 268:719-782, 

Huber R, Bode W. 1978. Struaural basis of the activation and action of tryp- 
sin. Acc Chem Res 77: 114-122. 

Hwang JK, Warshel A. 1988. Why ion pair reversal by protein engineering 
is unlikely to succeed. Nature 334:270-272. 

Jackson DY. Burnier J, Quan C. Stanley M, Tom J, Wells J A. 1994. A de- 
signed peptide ligase for total synthesis of ribonuclease A with unnatu- 
ral catalytic residues. Sc/e/rce 2(56:243-247, 

Jackson SE, Fersht AR. 1993. Contribution of long-range electrostatic in- 
teractions to the stabilization of the catalytic transition state of the ser- 
ine protease subtilisin BPN'. Biochemistry 52:13909-13916. 

James MNG. 1976. Relationship between ihe structures and activities of some 
microbial serine proteases. II. Comparison of the tertiary structures of 
microbial and pancreatic serine proteases. In: Ribbons DW. Brew K, eds. 
Proteolysis and physiological regulation. New York: Academic Press, 
pp 125-142. 

Joachimiak A. HaranTE. Sigler PB. 1994. Mutagenesis supports water me- 
diated recognition in the trp repressor-operaior system. EMBO J 13: 
367-372. 

Kabsch W, Sander C. 1983. Dictionary of protein secondary struaure: Pat- 
tern recognition of hydrogen -bonded and geometrical features. Biopoly- 
mers 22:1571-2637. 

Kahne D, Still WC- 1988. Hydrolysis of a peptide bond in neutral water. J 
Am Chem Soc ! 10:7529-1534. 

Kettner CA, Shcnvi AB. 1984. Inhibition of the serine proteases leukocyte 
elastase, pancreatic elastase, cathepsin G and chymotrypsin by peptide 
boronic acids. J Biol Chem 259:15106-15114. 

Kossiakoff AA, Spencer SA. 1981. Direct determination of the protonation 
states of aspariic acid- 102 and hi$tidine-57 in the tetrahedral intermedi- 
ate of the serine proteases: Neutron structure of trypsin. Biochemistry 
20:6462-6473. 

Kraut J. 1977. Serine proteases: Structure and mechanism of catalysis. Annu 

Rev Biochem ^(5:331-358. 
LaVatlie ER, Rehemtulla A, Racie LA* DIBIasio HA, Ferentz C, Grant KL, 

Light A. McCoy JM. 1993. Cloning and functional expression of a cDNA 

encoding the catalytic subunit of bovine cnterokinase. J Biol Chem 268: 

23311-23317. 

Lazure C, Scidah NG. Pelaprai D, Chretien M. 1983. Proteases and post- 
translational processing of prohormones: A review. Can J Biochem Cell 
Biol tf/:50I-5l5. 

LeTrong H, Neurath H, Woodbury RG. 1987a. Substrate specificity of the 
chymotrypsin-like protease in secretory granules isolated from rat mast 
cells. Proc Natl Acad Set USA 5^:364-367. 

LeTrong H, Parmelee DD. Walsh K A, Neurath H, Woodbury RG. 1987b. 
Amino acid sequence of rat mast cell protease 1 (chymase). Biochemis- 
try 26:6988-6994. 

Liao D, Breddam K, Sweet RM, Bullock T, Remington SJ. 1992. Refined 

atomic model of wheat serine carboxypeptidase 11 at 2.2 A resolution. 

Biochemistry 37:9796-9812. 
Liao D. Remington SJ. 1990. Structure of wheal serine car boxy peptidase II 

at 3.5 A resolution. A new class of serine protease. J Biot Chem 2(SJ:6528" 

6531. 

Light A, Fonseca P. 1984. The preparation and properties of the catalytic 
subunit of bovine enterokinase. J Biot Chem 259:13195-13198. 

Lobe CG, Finlay BB, Paranchych W, Paetkau VH» BIcackley RC. 1986. 
Novel serine proteases encoded by two cytotoxic T lymphocyte-specific 
genes, i'c/ence 252:858-861 . 

Locwenthal R, Sancho J, RcinikainenT, Fersht AR. 1993. Long-range sur- 
face charge-charge interactions in proteins. Comparison of experimen- 
tal resuhs with calculations from a theoretical method. J Moi Bio/ 232: 
574-583. 

Madison EL. Goldsmith EJ, Gerard RD. Gething MJH. Sambrook JF. 



J.J. Perona and CS. Craik 

Bassel-Duby RS. 1990. Amino acid residues that affect interaction of 
tissue-type plasminogen activator with plasminogen activator inhibitor 
I . Proc Natt Acad Sci USA 57:3530-3533. 

Magee AL Grant DAW, Hermon-Taylor J. 1977. The apparent molecular 
weights of human intestinal aminopeptidasc, enterokinase and maltase 
in native duodenal fluid. Biochem J 765:583-585. 

Markland FS. Smith EL. 1971. Subtilisins: Primary structure, chemical and 
physical properties. In: Boyer PD, ed. The enzymes, vol 3, New York: 
Academic Press, pp 516-608. 

Marklcy JL. 1979. Catalytic groups of serine proteases. NMR investigations. 
In: Shulman RG, ed. Biological applications of magnetic resonance. New 
York: Academic Press, pp 397-461. 

MarouxS, Barratri J, DesnuelleP. I97L Purification and spccificily of por- 
cine enterokinase. J Biol Chem 246:5031-5039. 

Matthews BW. J977. X-ray structure of proteins. In: Neurath H, Hill RL, 
eds. The proteins, vol 3. New York: Academic Press, pp 404-590. 

Matthews BW. Sigler PB, Henderson R. Blow DM. 1967, Three-dimensional 
structure of tosyl-or-chymotrypsin. Nature 2/4:652-656. 

Matthews DJ, Wells J A. 1993. Substrate phage: Selection of protease sub- 
strates by monovalent phage display. Science 260:1 1 13-1 1 17. 

Matthews G, Shennan KI, Seal A J. Taylor NA, Colman A, Docherty K. 1994. 
Autocatalytic maturation of the prohormone converiase PC2. J Biol 
Chem 269:588-592. 

McGrath ME, Haymorc BL, Summers NL, Craik CS. FIctterick RJ. 1993. 
Structure of an engineered, metal-actuated switch in trypsin. Biochem- 
istry 32 .\9U-\9\9. 

McGrath ME. Vasquez JR. Craik CS, Yang AS. Honig B, Fletterick RJ. 
1992. Perturbing the polar environment of Asp 102 in trypsin: Conse- 
quences of replacing conserved Ser 214. Biochemistry 57:3059-3064. 

McGrath M, WilkcME, Higaki JN, Craik CS, Fletterick R. 1989. Crystal 
structures of two engineered thiol trypsins. Biochemistry 25:9264- 
9270. 

McPhalen CA, James MNG. 1988. Structural comparison of two serine 
proteinase-proiein inhibitor complexes: EgUn C-subtilisin Carlsberg and 
Cl-2-subtilisin Novo. Biochemistry 27:6582-6598. 

Mizushima N, Spellmeyer D, HironoS, Pearlman D, Koilman P. I99I. Free 
energy perturbation calculations on binding and catalysis after mutat- 
ing threonine 220 in subtilisin. J Biol Chem 266; 1 1801-1 1809. 

Mortenson UH. Remington SJ. Breddam K. 1994. Site-directed mutagen- 
esis on (serine) carboxypepildase Y: A hydrogen-bond network stabilizes 
the transition state by interaction with the C-terminal carboxylate of the 
substrate. Biochemistry 33 : 508-5 17. 

Moult J, Sussman F, James MNG. 1985. Electron density calculations as an 
extension of protein structure refinement. Streptomyces griseus prote- 
ase A at i.5 A resolution. J Moi Biot 182:555-566. 

Murphy MEP, Moult J, Blcackley RC. Gershcnfcld H, Wcissman IL, James 
MNG. 1988. Comparative molecular model building of two serine pro- 
teinases from cytotoxic T lymphocytes. Proteins Struct Funct Genet 4: 
190-204. 

Nakatsuka T, Sasaki T. Kaiser ET. 1987. Peptide segment coupling catalyzed 
by the semisynthetic enzyme ihiolsubtiltsin. J Am Chem Soc /£?9;3808- 
3810. 

Narajana SV, Carson M. el-Kabbiani O, Kilpatrick JM. Moore D. Chen X, 
Bugg CE, De Lucas LJ, 1994. Structure of human factor D. A comple- 
ment system protein at 2,0 A resolution, J Moi Biol 255:695-708. 

Navia MA. McKeever BM, Springer JP, Lin TY, Williams HR, Fluder EM. 
Dorn CP, Hoogsteen K. 1989. Structure of human neutrophil elastase 
in complex with a peptide chloromethyl ketone inhibitor at 1 .84 A res- 
oluiion. Proc Natl Acad Sci USA 56:7-1 1 . 

Neurath H. 1984. Evolution of proteolytic enzymes. Science 224:350-357. 

Neurath H. 1985. Proteolytic enzymes, past and present. Fed Proc 44: 
2907-2913. 

Nienaber VL. Breddam K, Birktofl J J. J 993. A glutamic acid specific ser- 
ine protease utilizes a novel histidine triad in substrate binding, fl/or/ipm- 
/5rry 52:11469-11475. 

NyT, SawdeyM. Lawrence D, Millan J L, Loisukoff DJ. 1986. Cloning and 
sequence of a cDNA coding for the human beta-migrating endothelial- 
ccll-typc plasminogen activator inhibitor. Proc Nat/ Acad Sci USA 
55:6776-6780. 

Odake S, Kam CM. Narasimhan L, Poe M. Blake JT, KrahenbuhJ O, 
Tschopp J, Powers JC. 1991. Human and murine cytotoxic T-lymphocytc 
serine proteases: Subsite mapping with peptide thiocster substrates and 
inhibition of enzyme activity and cytolysis by isocoumarins. Biochem- 
istry 30:2217-2227 . 

OIlis DL, Cheah E. Cygler M, Dykstra B, Frolow F. Frankcn SM. Harel M, 
Remington SJ. Silman I. Schrag J, Sussman JL, Verschucren KHG. Gold- 
man A. 1992. The alpha/bcla hydrolase fo}d. Protein Eng 5: 1 97-2 IJ. 

Otwinowski Z, Schevhz RW, Zhang RG, Lawson CL. Joachimiak A, Mar- 



Substrate specificity in serine proteases 



359 



morstcin RQ, Luisi BF, Sigler PB. 1988. Crystal structure of trp repres- 
sor/operaior at atomic resolution. Nature 335 1'MX-'iZl, 

Padmanabhan K. Padmanabhan KP. TuUnsky A. Park CH, Bode W, Huber 
Blankenship DT, Cardin AD. Kisiel W. 1993. Stniaure of human ds(l- 
45) factor Xa at 2.2 A resolution. J Mol Bio! 252:947-966. 

Perona J J, Evnin LB, Craik CS. 1993a. A genetic selection elucidates struc- 
tural determinants of arginine versus lysine specificity in trypsin. Cene 
/J7:12l-126. 

Perona J J, Hedstrom L, Rutter WJ, Fletterick RJ. 1995. Structural origins 
of substrate discrimination in trypsin and chymotrypsin. Biochemistry 

3- ^:1489-1499. 

Perona J J, Hedstrom L, Wagner R, Rutter WJ, Craik CS, Fletterick RJ. 
1994. Exogenous acetate reconstitutes the enzymatic activity of Asp 189 
Ser trypsin. Biochemistry 53:3252-3259. 

Perona JJ, Tsu CA. Craik CS. Retterick RJ. 1993b. Crystal structures of 
rat anionic trypsin complexed with the protein inhibitors APPl and BPTI. 
J Mol Biol 250:919-933. 

Perona JJ. Tsu CA, McGrath ME, Craik CS, Fletterick RJ. 1993c. Relo- 
cating a negative charge in the binding pocket of trypsin. J Mot Biot 
250:934-949, 

Polgar L. 1989. Structure and function of serine proteases. In: Mechanisms 
of protease action. Boca Raton, Florida: CRC Press. Chapter 3. 

Polgar L. 1991- pH-dependent mechanism in the catalysis of prolyl endo- 
peptidase from pig muscle. Eur J Biochem /P7:44 1-447. 

Poulos TL, Alden RA, Freer ST, Birktoft J J. Kraut J. 1976. Polypeptide halo- 
methyl ketones bind to serine proteases as analogs of the tetrahedral in- 
termediate. J Biol Chem 257 : 1097-1 103. 

Powers JC, Tanaka T. Harper JW, Minematsu Y, Barker L. Lincoln D, 
Crumley KV, Fraki JE, Schechter NM, Lazarus GG, Nakajima K, 
Nakashino K, Neurath H, Woodbury RG. 1985. Mammalian chymo- 
trypsin-like enzymes. Comparative reactivities of rat mast cell proteases, 
human and dog skin chymases, and human cathepsin G with peptide 

4- nitroanilide substrates and with peptide chloromethyl ketone and sul- 
fonyl fluoride inhibitors. Biochemistry 2-/;2048-2058. 

Rao SN, Singh UC, Bash PA, Kollman PA. 1987. Free energy perturbation 
calculations on binding and catalysis after mutating Asn 155 in subti- 
lisin. Nature 328:55\'554. 

Read RJ, James MNG. 1988. Refined crystal structure of Streptomyces 
griseus trypsin at 1.7 A resolution. J Mol Biol 200:523-551 . 

Rehcmtulla A, Barr PJ. Rhodes CJ, Kaufman RJ. 1993. PACE4 is a mem- 
ber of the mammalian propeptidase family that has overlapping but not 
identical substrate specificity to PACE, ^/oc/iem/j/o' 52:1 1586-1 1590. 

Remington SJ. Woodbury RG, Reynolds RA, Matthews BW. Neurath H. 
1988. The structure of rat mast cell protease at 1.9-A resolution. Bio- 
chem 2 7 : 8097 - 8 1 05 - 

Rheinnecker M, Baker G, Edcr J, Fersht AR. 1993. Engineering a novel spec- 
ificity in subtilisin BPN'. B/'oc/jem/j/rv 52:1 199-1203. 

Rheinnecker M, Edcr J, Pandey PS, Fersht AR, 1994. Variants of subtilisin 
BPN' with altered specificity profiles. Biochemistry 33 :22] -225. 

Robertus JD, Alden RA, Birktoft J J. Kraut J. Powers JC. Wilcox PE. 1972a. 
An X-ray crystailographic study of the binding of peptide chloromethyl 
ketone inhibitors to subtilisin BPN'. Biochemistry 77:2439-2449. 

Robertus JD, Kraut J, Alden RA, Birktoft J. 1972b. Subtilisin: A stereo- 
chemical mechanism involving transition-state stabilization. Biochem- 
istry 77:4293-4303. 

Rose GD, Creamer TP. 1994. Protein folding: Predicting predicting. Pro- 
teins Struct Fund Genet J9:\-3, 

Ruhlmann A, Kukla D. Schwager P, Bartels K, Huber R. 1973. Structure 
of the complex formed by bovine trypsin and bovine pancreatic trypsin 
inhibitor. J Mol Biol 77:417-436. 

Russell AJ, Thomas PG, Fersht AR. 1987. Electrostatic effects on modifi- 
cation of charged groups in the active site cleft of subtilisin by protein 
engineering. J Mol Biol 795:803-813. 

Salvesen G, Farley D, Shuman J, Przybyla A. Reilly C, Travis J. 1987. Mo- 
lecular cloning of human cathepsin G: Structural similarity to mast cell 
and cytotoxic T lymphocyte proteinases. Biochemistry 25:2289-2293. 

Schechter I, Bcrgcr A. 1968. On the size of the active site in proteases. I. 
Papain. Biochem Biophys Res Commun 27:157-162. 

Schellenberger V, Turck CW, Hedstrom L, Rutter WJ. 1993. Mapping the 
S' subsites of serine proteases using acyl transfer to mixtures of peptide 
nuclcophilcs. Biochemistry 52:4349-4353. 

Schellenberger V, Turck CW, Rutter WJ. 1994. Role of the S' subsites in ser- 
ine protease catalysis. Active-site mapping of rat chymotrypsin, rat tryp- 
sin, a-lytic protease and cercarial protease from Schistosoma mansoni. 
Biochemistry 33 : 425 1 -425 7 . 

Scidah NG, Day R, Marcinkiewicz M, Benjannet S, Chretien M. 199L Mam- 
malian neural and endocrine pro-protein and pro-hormone convertases 
belonging to the subtilisin family of serine proteases. Enzyme ^5:271-284. 



Sellos D, Van Wormhoudt A. 1992. Molecular cloning of a cDNA thai en- 
codes a serine protease with chymotryptic and collagenolytic activities 
in the hepatopancreas of the shrimp Penaeus vanameii (Crustacea, De- 
capoda). FEBS Utt 509:219-224. 

Shakked Z, Guzikevich-Gucrstcin G. Frolow F. Rabinovich D, Joachimiak 
A, Sigler PB. 1994. Determinants of repressor-^perator recognition from 
the structure of the trp operator binding site. Nature 565:469-473. 

Shaw E, Mares-Guia M. Cohen W. 1965. Evidence for an active site hisii- 
dinc in trypsin through use of a specific reagent, I-chloro-3-tosylamido- 
7-amino-2-heptanone, the chloromethyl ketone derived from 
N^-tosyl-L-Iyslne. Biochemistry 4:2219-2226. 

Siezen RJ, Bruinenberg PG, Vos P, van Alen-Boerrigter 1, Nijhuis M, Alting 
AC, Exierkate FA. de Vos WM. 1993. Engineering of the substrate- 
binding region of the subtilisin-like, cell-envelope proteinase of Lacto- 
coccus lactis. Protein Eng 6:927-937. 

Siezen RJ. de Vos WM, Leunisscn JAM, Dijkstra BW. 1991. Homology 
modelling and protein engineering strategy of subtilases. the family of 
subtilisin-likc serine proteases. Protein Eng 4:717-719. 

Sinha S. Watorek W, Karr S, Giles J, Bode W. Travis J. 1987. Primary struc- 
ture of human neutrophil elastase. Proc Natl Acad Sci USA 54:2228- 
2232. 

Smeekens SP, Avruch AS, LaMendola J, Chan SJ, Steiner DF. 1991. Iden- 
tification of a cDNA encoding a second putative prohormone conver- 
lase related to PC2 in AtT20 cells and islets of Langerhans. Proc Naff 
Acad Sci USA 5J?:340-344. 

Smith CL, De Lotto R. 1994. Ventralizing signal determined by protease ac- 
tivation in Drojop/t/to cmbryogenesis. /Vc/une 565:548-551. 

Sorcnsen SB, Bech LM, Meldal M, Breddam K. 1993. Mutational replace- 
ments of the amino acid residues forming the hydrophobic S4 binding 
pocket of subtilisin 309 from Bacillus lentus. Biochemistry 52:8994- 
8999. 

Sprang S. Standing T. Fletterick RJ. Stroud RM, Finer-Moore J, Xuong NH. 
Hamlin R, Rutter WJ. Craik CS. 1987. The three-dimensional structure 
of Asn 102 mutant of trypsin: Role of Asp 102 in serine protease catal- 
ysis. Science 257:905-909. 

Stein RL. Strimpler AM, Hori H. Powers JC. 1987. Catalysis by human leu- 
kocyte elastase: Mechanistic insights into specificity requirements. Bio- 
chemistry 26: 1 30 1 - 1 305 . 

Steitz TA, Shulman RG. 1982. Crystailographic and NMR studies of the ser- 
ine proteases. Annu Rev Biophys Bioeng 77:419-444. 

Stroud RM. 1974. A family of protein -cutting proteins. Sci Am 25:74-88. 

Svendsen (, Jensen MR, Breddam K. 1991. The primary structure of the 
glutamic acid-specific protease of Streftomyces griseus. FEBS Lett 
2P2:I65-I67. 

Takeuchi Y, Noguchi S, Saiow Y. Kojima S. Kumagai I, Miura K, Nakamura 
KT, Mitsui Y. 1991a, Molecular recognition at the active site of subti- 
lisin BPN': Crystailographic studies using genetically engineered protein- 
aceous inhibitor SSI (Streptomyces subtilisin inhibitor). Protein Eng 
4:501-508. 

Takeuchi Y, Satow Y, Nakamura KT, Mitsui Y. 1991b. Refined crystal struc- 
ture of the complex of subtilisin BPN' and Streptomyces subtilisin in- 
hibitor at 1.8 A resolution. J Mol Bio/ 227:309-325. 

Teplyakov AV, van der Laan JM, Lammers AA, Kelders H. Kalk KH, Mis- 
set O, Mulleners LSJM, Dijkstra BW. 1992. Protein engineering of the 
high-alkaline serine protease PB92 from Bacillus alcalophilus: Functional 
and structural consequences of mutation at the S4 substrate binding 
pocket. Protein Eng 5:413-420. 

Thompson RC, Blout ER. 1970. Dependence of the kinetic parameters for 
elastase-catalyzed amide hydrolysis on the length of peptide substrates. 
Proc Natl Acad Sci USA 67:1734-1743. 

Tsu CA, Perona J J, Schellenberger V, Turck CW. Craik CS. 1994. The sub- 
strate specificity of Uco pugilator collagenolytic serine protease 1 cor- 
relates with the bovine type I collagen cleavage sites. J Biol Chem 
269:19565-19572. 

Van den Ouweland AMW, Van Duijnhoven HLP, Kcizcr GD, Dorssers LCJ, 
Van de Ven WJM. 1990. Structural homology between the human fur 
gene product and the subtilisin-like protease encoded by yeast KEX2. Nu- 
cleic Acids Res 75:664-674. 

van dcr Laan JM. Teplyakov AV. Kelders H. Kalk KH, Misset O, Mulleners 
LJSM. Dijkstra BW. 1992. Crystal structure of the high-alkaline serine 
protease PB92 from Bacillus alcalophilus. Protein Eng 5:405-41 1 . 

Van de Ven WJ. Roebroek AJ, Van Duijnhoven HL. 1993. Structure and 
function of eukaryotic proprotein processing enzymes of the subtilisin 
family of serine proteases. Cn't Rev Oncogen 4:1 15-136. 

Warshel A, Naray-Szabo G, Sussman F, Hwang JK. 1989. How do serine 
proteases really work? Biochemistry 28:3629-3637. 

Watson HC, Shotton DM. Cox JC, Muirhcad H. 1970. Three-dimensional 
Fourier synthesis of losyl elastase at 3.5 A resolution. Nature :906~Sl I. 




360 



J.J. Perona and C,S. Craik 



Wei AZ. Mayr I. Bode W. 1988. The refined 2.3 A crystal structure of hu- 
man leukocyte elastase in a complex wiih a valine chloromethyl ketone 
inhibitor. FEBS Lett 234:367-313, 

Wells J A. Cunningham BC, Craycar TP, Estell DA. 1986. Imponance of 
hydrogen bond formation in stabilizing the transition state of subtilisin. 
Phiios Trans R Soc Lond A J/ 7:4 15-423. 

Wells JA, Cunningham BC, Craycar TP, Estell DA. J987a. Recruitment of 
substrate-specincity properties from one enzyme into a related one by 
protein engineering. Proc Natl Acad Set USA 5^:5167-5171. 

Wells JA, Cunningham BC, Craycar TP, Estell DA. Carter P. 1987b. On 
the evolution of specificity and catalysis in subtilisin. Cold Spring Har- 
bor Syrrtp Quant Biot 52:647-652. 

Wells JA, Estell DA. 1988. Subtilisin- An enzyme designed to be engineered. 
Trends Biochem Set /J:29l-297. 

Wells JA. Powers DB, Bott RR, Craycar TP, Estell DA. 1987c. Designing 
substrate specificity by protein engineering of electrostatic interactions. 
Proc NattAcad Set USA W:12I9-1223. 

Wilke ME, Higaki JN, Craik CS, Flettcrtck RJ. 1991. Crysiallographic anal- 
ysis of trypsin G226A. A specificity pocket mutant of rat trypsin with 
altered binding and catalysis. / Mol Biol 279:525-532. 

Willeti WS, Gillmor S, Perona J J. Fletterick RJ, Craik CS, 1995. Engineered 
metal regulation of trypsin substrate specificity. Biochemistry. Forth- 
coming. 



Wilson C, Mace J, Agard DA. 1991 . Computational method for the design 
of enzymes with altered substrate specificity. J Mol Biol 220:495-506. 

Woodbury RG, Everitt MT, Sanada Y, Katunuma N, Lagunoff D, Neurath 
H. 1978a. A major serine protease in rat skeletal muscle: Evidence for 
its mast cell origin. Proc Natl Acad Sci USA 7J:531 1-53 13. 

Woodbury RG, Gruzenski CM, Lagunoff D. 1978b. Immunofluorescent lo- 
calization of a serine protease in rat small intestine. Proc Natl Acad Sci 
USA 75:2785-2789. 

Wright CS, Alden RA, Kraut J. 1969. Structure of subtilisin BPN' at 2.5 A 
resolution. Nature 221:235-242. 

Yoshida N. Everitt MT, Neurath H, Woodbury RG, Powers JC. 1980. Sub- 
strate specificity of two chymotrypsin-likc proteases from rat mast cells. 
Studies with peptide 4-nitroanilides and comparison with cathepsin G. 
Biochemistry yP:5 799-5 804. 

Zcrner B. Bender ML. 1964. The kinetic consequences of the acyl-cnzyme 
mechanism for the reactions of specific substrates with chymotrypsin. 
J Am Chem Soc 56:3669-3674. 

Zhou GW, Guo J, Huang W, Fletterick RJ, Scanlan TS. 1994. The three- 
dimensional structure of a catalytic antibody with active site similarity 
to serine proteases. Science 265:1059-1064. 



Exhibit 3 1 




lable Copy 





ids A^^* 

fnn. si.i'' 
Mas. '•• 

■ resolor^ 

me vec-"' 

^linked 
J. Med. 

. OUag. 
on of a 
nfirma. 
- Genet 

I, M.G. 
3 in X- 
-516. . 

vcU. S., 
1, rapid • 
rtiftcial 
390. . 

J., Hoi. 
, S,G., 
tcheva, 
Mott, 
lentley» - 
ed map 

)ehnke, 
ip of X- 
Am, J. ■ 

Sanger 

erprint . ■ 

:)logical 
s. Doc '-: 

Leigh. 
[ That- 
's map 
'0-68. f;\ 

ohthah 

J., Van, 
-p22-2 2 

)4. 



W'Zjto>ilcs 44, 309-320 (1997) 
^i^mCLE NO. GE974845 



ft? ^ 



Cloning of the TMPRSS2 Gene, Which Encodes a 
Novel Serine Protease with Transmembrane, 

LDLRA, and SRCR Domains 
and Maps to 21q22.3 

Ariane Paoloni-Giacobino,* Maiming Chen,* Manuel C. Peitsch,t 
Colette Rossier,* and Stylianos E. Antonarakis*-**^ 

* Laboratory of Humar) Molecular Genetics, Departmer)t of Oer^etics and Microbiology, 
Geneva University Medical School, Geneva; t Glaxo Institute for Molecular Biology, 
Geneva; and ^Division of Medical Genetics, Cantonal Hospital of Geneva, 

1211 Geneva, Switzerland 

Received March 24, 1997; accepted June 6, 1997 



-- To contribute to the development of the transcrip- 

-Uon map of human chromosome 21 (HC21), we have 
" used exon trapping from pools of HC21-9pecific cos- 
V-mids. Using selected trapped exons, we have identified 
:-^a novel gene (named TMPRSS2) that encodes a mul- 
^ timeric protein with a serine protease domain. The 
..;TMPRSS2 3.8-kb mRNA is expressed strongly in small 

intestine and weakly in several other tissues. The fuU- 
1 length cDNA encodes a predicted protein of 492 amino 
r acids that contains the following domains: (i) A serine 

protease domain (aa 255-492) of the Si family that 
Z probably cleaves at Arg or Lys residues, (ii) An SRCR 

(scavenger- receptor cysteine-rich) domain (aa 149- 
'~ 'Z42) of group A (6 conserved Cys). This type of domain 
-. * is involved in the binding to other cell surface or extra- 

V cellular molecules, (iii) An LX)LRA (LDL receptor class 
>^A) domain (aa 113-148). This type of domain forms a 

binding site for calcium, (iv) A predicted transmem- 
*:i>brane domain (aa 84-106). No typical signal peptide 
r was recognized. The gene was mapped to 21q22.3 be- 

V tween markers ERG and D21S56 in the same PI as MXl. 
gThe physiological role of TMPRSS2 and its involve- 
:?:&ient in trisomy 21 phenotypes or monogenic disorders 
:^that map to HC21 are unknown. C 1997 Academic Pms 



INTRODUCTION 



Hiunan chromosome 21 (HC21) is the smedlest chromo- 
'^pine, with a long arm (21q) of aroimd 40 Mb, containing 
Approximately 600-1000 genes (reviewed in Ajntonarakis, 
[993), and a short arm (21p) of arovind 10-15 Mb, which 

^Sequence data from this article have been deposited with the Gen- 
^ank Data Library under Accession Nos. U75329 (cDNA) and 
^229, X88228. X88321, X88043. and X88047 (trapped exons). 
g-.To whom correspondence should be addressed at Division de G6n- 
rPque M^dicale, Centre Medical Universitaire, 1 rue Michel-Servet, 
^1 Genfeve 4, Switzerland. Telephone: 41.22-7025707. Fax: 41-22. 
025706. E-mail: Stylianos»Antonarakis€hnedecine.unige.ch. 



is highly homologous to those of the other four human 
acrocentric chromosomes. To date» about 75 HC21 genes 
have been cloned and partially ch8u*acterised [(^enome 
DataBase, httpO/gdbwww.gdb.org, and SWISS-PROT, 
httpy/ww\v.expasy.ch]. Trisomy for human chromosome 
21 is the most common chromosomal abnormality at 
birth, leading to the phenotypes known as Down syn- 
drome (Epstein, 1989). In addition, the loci for several 
monogenic disorders have been mapped to HC21. Dense 
linkage maps and almost complete physical maps of 21q 
have already been obtained and are now extensively 
used for the characterization of HC21 genes and the ef- 
forts to determine the nucleotide sequence of HC21. The 
cloning and characterization of HC21 genes are a neces- 
sary step for the understanding of Down syndrome and 
the molecular etiology of monogenic disorders mapping 
on this chromosome. 

In our laboratory, systematic exon-trapping experi- 
ments have been performed to identify portions of 
HC21 genes, clone and characterize the corresponding 
full-length cDNAs and genes, and participate in the 
international effort to create a transcription map of 
HC21 (Chenge? a/., 1994; Peterson e^ a/., 1994;Tassone 
et aL, 1994; Lucente et al., 1995; Chen et qL, 1996). We 
report here the cloning of a novel serine protease gene 
(TMPRSS2), which is expressed mainly in the small 
intestine, but also in lower levels in several other tis- 
sues, and which maps to 21q22.3. The predicted poly- 
peptide of TMPRSS2 also contains a transmembrane 
domain, a scavenger receptor cysteine-rich (SRCR) do- 
main, and an LDL receptor class A (LDLRA) domain, 
and it probably belongs to the type II integral mem- 
brane proteins. The TMPRSS2 gene is homologous to, 
but different from, the human enteropeptidase gene, 
which maps to a different region of HC21 (21q21). 

MATERIALS AND METHODS 

Exon Trapping 

Pools of chromosome 21.specific cosmids from the LL21NC02 li- 
brary (kindly supplied by P. de Jong) were used in exon-trapping 



309 



0SS8-7S43/97 S25.00 
Copyright C 1997 by Academic Press 
AJl nghts of reproduction in any form reserved. 



BesJ^^^lable Copy 




•310 



PAOLONI-GIACOBINO ET AL. 



experiments (Buckler et al„ 1991; Church et at., 1994; Gibco BRL 
Manual 18449-017). EcoRl- and Psfl-digested cosmida were sub* 
cloned into pSPL3 vector, and plasmid DNA was used to transfect 
Cos? mamroalian cells using UpofectACE (Gibco BRL). Total RNA 
was isolated from Cos? cells 24 h after transfection, cDNA was syn- 
thesized, and PGR products were subcloned into pAMPlO vector by 
UDG (uracil DNA glycosylase) cloning. After elimination of cryp- 
tically spliced, pSPL3*derived clones by oligonucleotide screening, 
the inserts of individual pAMPlO clones were subjected to nucleotide 
sequencing on an ABI373A automated sequencer by dideoxy texrni- 
nator fluorescence method using Tag polymerase- Nucleic acid and 
amino acid homologies of the resulting sequences were analyzed 
through BLASTN and BLASTX searches of the nonredundant data- 
base (Altschu) et aL. 1990). 

Cloning of TMPRSS2 cDNA 

The 216-bp PGR product derived from trapped exon HMC26A01 
with oligonucleotide primers (26A01A. 5'-GCCTGCGGGGTCAAC- 
TTGAAC-3', and 26A01B. 5'-GGCGGCTGTCACGATCCACTC-3') 
was used as a probe to screen approximately 500,000 clones of a 
human heart \gtlO cDNA library (Clontech HL.3026a). One positive 
clone (APGl) was isolaUd, and the 2.4-kb insert was subcloned into 
the pAMPlO vector and sequenced in both directions using standard 
oligonucleotide walking protocols for the ABI373 automated se- 
quencer. The nucleotide sequence was verified using RT-PCR prod- 
ucts from intestine poly(A)* mRNA. 

Chromosomal Mapping 

Two independent methods were used to assign TMPRSS2 to a 
human chromosome. First, PGR amplification of the trapped exon 
HMC26A01 with specific oligonucleotide primers (26MAP1, 5'-GAG- 
GCrrTCTGCAGCTTCATC-3', and 26MAP2, 5'-CAATCGATGGCA- 
TTGGACGG-3') was performed on the genomic DNA from a panel 
of somatic cell hybrids with defined segments of HC21. Second, the 
insert of the initial trapped exon HMC26A01 was used to probe high- 
density filters of cosmids from the HC21-specific LL21NC02 library. 
Finally, PGR amplification using either oligonucleotide primers 26 
MAPI and 26 MAP2 or 26A01A and 26A01B was used on DNAs from 
a panel of HG21-derived YACs. 

5'- and 3* 'RACE (Rapid Amplification ofcDNA 
Ends) 

To obtein the 5' end of the TMPRSS2 cDNA, 5'-RACE was per- 
formed on human small intestine cDNA, From 1 prg of poly<A)* RNA 
(Glontech 6547-1) cDNA was made with the Marathon cDNA Ampli- 
fication kit (K-1802-1). and 5'-RACE using nested PGR primers was 
carried out with the enzyme Tag Expand High Fidelity (Boehringer 
Mannheim) according to the manufacturer's protocol. The gene-spe- 
cific primers were 26A01B (see above) and AP26BB (5'-CCGCTG- 
TGATGCACTATTCC-3 ' ). In two different experiments the same 
PGR product of 670 bp was generated and subjected to nucleotide 
sequencing. 3 '-RAGE was carried out using gene specific primers 
AP26G (5'-(XlTTCrrGGCTGTGCGAAAGC.3') and AP26K (5'-GTG- 
TGGCTTTGGCACTCTCTGC-3'), and a PGR product of approxi- 
mately 2.0 kb was generated. 

Northern Blot Analysis 

The cDNA clone APGl containing the complete coding sequence 
was used to probe two Northern blots, each containing poly<A)* RNA 
from eight human adult tissues (Glontech 7769-1, Clontech 7760-1), 
and one containing four fetal tissues (Clontech 7756-1). Northern 
Blot analysis was performed using standard protocols, with high- 
stringency washing. A control hybridization using a human actin 
probe was used for determination of the amount of RNA loaded in 
these Northern blots. 

Comparative Protein Modeling 

The sequences of both LDLRA and protease domains of TMPRSS2 
were submitted to the S\MSS-MOO£L 'automated comparative pro- 



tein modeling server (Peitech, 1995, 1996). The models were 
as follows: 

LDLRA domain. SWISS-MODEL could not automatically 
vide a 3D structure of this domain since the degree of identity' 
the most similar sequence of known 3D structure was less than _ 
Using BLAST (Altschul et aL, 1990), we identified the Brookh^ 
Protein Data Bank entry ILDL (NMR structure of the LDLRx 
main) (Daly et al., 1995) as the suitable modeling template. We t 
aligned the TMPRSS2 LDLRA domain with the sequence of 
and submitted the sequence alignment to SWISS-MODEL using'{^ 
Optimise mode. 

Serine protease domain. This domain was modeled using ^t}^ 
First Approach mode of SWISS-MODEL, which provides fully ai^' 
mated template identification and multiple sequence alignment pii^ 
to model building. Chymotrypsin (P17538) was identified as a 8ugl§ 
able modeling template. The template and TMPRSS2. protease 
quences were automatically aligned and the model generation pi^i 
ceeded to the end without human intervention. Sequence to structuri^ 
fitness analysis using both 3D- ID profiles (Lathy et aL, 1992) ajf^ 
Prosall (Sippl, 1993) did not show any obvious discrepancies. The^ 
coordinates of both the LDLRA and the serine protease domaio of*^ 
TMPRSS2 can be found in the SWISS-MODEL Repository (http^'l 
www.expasy.ch/swissmod/swmr- top. html). ^ 



RESULTS 



•»-'r.V,i 



1^? 



Exon Trapping Identified a Clone with Homology to -i j 
Human Proteases -^j^ 

To clone partial gene sequences from human chronw^^ 
some 21 we have used pools of cosmids (from the; 
LL21NC02-Q library) in an exon-trapping experimenij;: 
and have identified more than 550 different potentii^/ 
exons (Chen et aL, 1996). One trapped sequenil 
HMC26A01 (GenBank X88229) of 216 bp showed a; 
strong homology to a large list of serine proteases from ; 
human and other species. BLASTX analysis, for exam' 
pie, revealed a 55% amino acid identity to human'! ^ 
prostasin (GenBank L41351; P = 1.3e-15). Other re|^;l|p;;. 
resentative homologies included human elastase^J 
(P08218), Erinaceus europaeus plasminogen (U33171V-: 
and pig human coagulation factor IX (P16293). BecauS; 
this HMC26A01 trapped sequence was probably d^^^ 
rived from a undescribed human serine protease, wei;: 
set out to clone and initially characterize the full-lengthy. 
cDNA of the corresponding human gene. 

Isolation of Full-Length TMPRSS2 Coding Sequences^c^ 

Clone HMC26A01 was used to screen approximately. 
500,000 clones of a human heart \gtlO cDNA library^ 
(this library was chosen because of the expression pat^ 
tern in Northern blots; see below). One positive 
(APGl). containing a 2.4-kb-long insert, was obtain^ 
subcloned into the pAMPlO vector, and subjectedjw| 
nucleotide sequence. 5'-RACE from. intestinal mTO^ 
(again chosen because of the expression pattern) usig?| 
oligonucleotides close to the 5' end of the APGl clop^ 
extended the 5'UTR sequence by about 150 n^^^, 
tides. Sequence analysis from both stremds reveal 
an open reading frame of 492 amino acids star 
from the most N-terminal methionine codon. 
3'UTR from the original clone APGl was app 
mately 0.95 kb. Figure 1 shows the complete nucleo] 




Be^^Hailable Copy 



CLONING AND MAPPING OF TMPRSS2 GENE 



311 



ho 



r» on 



o r» 
o 



CO 
O XT 



o 
o 
tn 



in 



o o o o e o o 
r« CO ^ o w n CO 
u> u> r> CO CO C7\ 0> 



o o o o 
o w r>t CO 



O O Ol 

o «> r- 




S 8 8 8 8 




8 S 8 



o u> 



O CO 



o 

O ,H 



O »H 



O r-4 



o 



O CN 
VO 



o r« 

CM 



vo 



o 

(A 



o 
o *n 

VD 




i 








i 




i 






I 






t 

J. 






31 



<n 



o> c a> 

c :S 1 « 

Q to C OQ 
~ Cvi O C 

3 O ^ .S 
" 2 OS tJ 

c fiS .= 

£ S "5 

Z -S ^ £ 
« g 1 c 

S g o 

t|l § 

f i 

CS 00 
CSI g T) « 

s 2: 3 

*S -o o 

2^ .S 5 
.2 * « £ 

trt c o — 

< a «u !, 
« c 9 I 



C 3 

?5 g « " 

^ g.1 I 

CO a. Q. 5 

^ 2 <u c 
Cu i; ^ •— 

^ £ o -> 

c c ° 

u c S c 
c ~ fc c 

O l_ O .'^ 

s a» -c c: 
cr t3 ^ o 

C c - 
« 3 5 — 

— ^ 

*-* 5 fc: E- 

■5 « s « 

o — 

? "§ 3 2 

^ «o ^ u 

■si To *J S 
S c a .£ 

« u „ c 
3 -J E *5 

2^^.e E 



• -o to 

c c ^ 

U ZI ^ ^ /-vJ 

^ C e e 

-a ^ ^ ci> 

O X X 

u CJ o uu 



Bes^^milable Copy 



312 



PAOLONT-GIACOBINO ET AL. 

. .agggcacctctcccccgctctctctgcaas /TGCGCAGCAA AATCGGTGTS/ a^gastcagcctcaaccccgggoogggacc. . . 

Clio V149 

. .aacccatggacaaccctccccctcgcgcas /TTCGCCTCTA TSGCCTATAA/ a£gagcacggggcagcacccgccgagcgac. . . 

RISO K191 

. .cgcgaccagaactccccgccccccccgcoa /TCATCCCTOT tctttaccct/ a£acaggcaagttcacccggagtcccccct . , . 



D229 



C241 



. .ctgagacactgagcccctcctcccccccoa /ACCTCTTAAC ACTTTCAACG/ g^acgcgcggcccaggcccggcaagcaggc . . . 

P301 D959 

. .ggcccactgcgccccccccccccgaaacas /ACCTACTGAA GACGACAAAG/ gtQaorqctQcccctQqQcacacaggactgc . . . 

G391 

, . tgggagcccaacaagtccccccgtccccftg /CGAAGACCTC TTCrrGCCAG/ ofcaacccaacacctccaccccaccctcggcc . . . 

K393 0436 

. .crgccccccgcaccccgccgcgccccacas /GGTGACAGTG ATGAAGGCAA/ aj;aaccatcctgccccccccctgactgtgct . . . 

G439 N491 

. .caccccctcccctcccacccgaacaggcAfl /ACCGCj^aatccacacggccctcgcccccgacgccgp (3UTR) . . . 

G492 • 

FIG. 2. Intron/exon junctions of the TMPRSS2 gene as determined by comparison of the cDNA sequence to the publicly 
sequences of the human Pi clone 35-H5-C8 (Martin et ai., 1994; Genbank Accession Nos. L35675-L35682). 




availahli 



7760 



and predicted amino acid sequence of TMPRSS2. This 
cDNA was verified by RT-PCR amplifications from in- 
testinal RNA using pairs of oligonucleotide primers 
from the cDNA sequence. Interestingly, no ESTs iden- 
tical to portions of the TMPRSS2 cDNA sequence were 
identified in the dbEST database of GenBank (search 
of February 18, 1997). A number of additional exons 
from the Chen et aL (1996) study were identical to 
portions of the TMPRSS2 cDNA, including HMC44E11 
(GenBank X88043), HMC26A05 (GenBank X88228), 
HMC19A07 (GenBank X88321), and HMC44D02 
(GenBank X88047). 

Intron / Exon Junctions 

Homology searches with sequences available in the 
public databases revealed identity of discontinuous re- 
gions of the TMPRSS2 cDNA with portions of human 
PI clone 35-H5-C8 which was sequenced by Martin and 
co-workers (Martin et aL, 1994; GenBank Accession 
Nos. L35675-L35682). The comparison of the cDNA 
sequence of TMPRSS2 with the genomic sequence of 
human PI revealed intron/exon junctions that are 
shown in Fig. 2. Not all such junctions are reported in 
the figure since the sequence of the entire PI clone was 
not available in the public databases. It is likely that 
there are additional introns 5' to codon 110 and be- 
tween codons 191 and 229 and codons 241 and 301. 

Mapping of TMPRSS2 to Chromosome 21 

PGR amplification was performed with oligonucleo- 
tide primers 26MAjPl and 26MAP2 on genomic DNA 
from rodent— human somatic cell hybrids that con- 
tained either single human chromosomes (NIGMS 2; 
Drwinga et aL, 1993) or specific segments of HC21 (Pat- 
terson et aL, 1993). The expected 155-bp PGR product 
was present in somatic cell hybrids WAV17, E7b, 725, 
2Furl, R50-3, GA9-3, 9528C-1, 188lC.13b, 8q-, ACEM 
2-lOd, JC6A, and 1x4; in contrast, somatic cell hybrids 



21q+, 6918-8al, and MRC2-G.did not show amplifica?:: 
tion (data not shown). These data localized this human"^ 
protease to the region 21q22.3 between markers EHG 
and D21S56 (Fig. 3). ^ 
We used exon HMC26A01 to probe a subset of the' 
cosmid library LL21NC02. One cosmid, Q20A3. waa 
identified as positive* PGR on this cosmid with the 
same primers 26MAP1 and 26MAP2 produced the ex£^ 
pected 155-bp fragment, confirming that Q20A3 conv 
tained this exon of TMPRSS2 gene. Yeast DNA from 
79 YAC clones, chosen to cover almost all of HC21 (Chu-' 



13 ® 

12 
1V2 
11.1 

11.1 
11.2 

31 



32.1 

22.2 
23.3 



tSwith t 

j|3')in 
.ified. ^ 
iofChu 
l^these ■■ 
'l^sjabsen' 

pD21S£ 
^"clones 
'I dele tic 
f- As I 
i TMPR 
SPl clo 
^-and cc 
^.8ion r 
l/T^gene I 



HC21 




MX1 

TMPRSS2 




uenct 
fH5.C8 
t^lained 

W^orthe 

fcv The 
&againE 

hybi 
Jt3&?es of 



•J- 



.-1 55 tPi 

m. 

FIG. 3. Schematic representation of the mapping position 
TMPRSS2 gene on chromosome 21 as resulted from PCR ampli?? 
tion of somatic ceU hybrids and sequence identities with a chi^ 
some 21 Pi clone (see Results). Representative results from.y 
amplification using oligonucleotide primers 26MAP1/26MAP2 ^ 
text) are also shown. 



iKhz 



1 



Bea^^^ilable Copy 



CLONING AND MAPPING OF TMPRSS2 GENE 



313 



09 
CO 



s ki ' - ■ 



Kb 

" 3.8 

- 2.0 



e « s 

si I 
Is Is g 



S i 
% § 

1 111! 

" ^ 5 8 s 



1 1 1 i 

d s S 5 

<B O O O 



kb 

- 9.5 
-7.5 

- 4.4 

- 2.4 

- 1.35 



IfIG. 4. Northern blot analysis using the TMPRSS2 cDNA as hybHdization probe. The RNA filters are from Clontech (Cat. Nos. 7750- 
7760-1, 7759-1, and 7756-1) and contain 2 //g of poIy(A)* mRNA per tissue indicated. The thick arrow shows the 3.8-kb mRNA species* 
^ile the thin arrow depicts the faint 2.0-kb mRNA. 




^iakov et aL, 1992), was used for PGR amplification 
^jfyith the two pairs of oligonucleotide primers 26MAP1- 
126MAP2 and AP26G (5'-GGTTCTGGCTGTGCCAA- 
GC-3')-AP26H (5'-CCAATGTGCAGGTGGAGACC- 
3') in the 3 'UTR region. No positive YACs were identi- 
ied. Many single YACs in 21q22.3 from the collection 
"ofChumakovc/ a/. (1992) were also tested by PGR with 
^^hese primers and no amplification was observed. The 
Absence of positive YACs for this human TMPRSS2 
jiene suggests either that the HC21 contig (Chumakov 
*jH aL, 1992) in the region between markers ERG and 
P21S56 contains at least one gap or that the YAC 
'dones available to our laboratory have accumulated 
'deletions. 

As described above, discontinuous regions of the 
;TMPRSS2 cDNA were identical to portions of human 
^1 clone 35-H5-C8, which was sequenced by Martin 
Bad co-workers (Martin et aL, 1994; GenBank Acces- 
mon Nns. L35675-L35682). This PI also contained 
gene MXi, which maps to 21q22.3 in the interval be- 
tween ERG and D21S56 (Fig. 3). Therefore, this se- 
ence identity of TMPRSS2 with portions of PI 35- 
®i5-C8 is in agreement with the mapping position ob- 
ed using the somatic cell hybrids. 



orthern Blot Analysis 

The insert of cDNA clone APGl was used as a probe 
inst three filters containing 2 ^g of poly(A)* RNA 
m 16 h uman adult tissues and 4 h\iman fetal tissues, 
^hybridization signal corresponding to an mRNA spe- 
approximately 3.8 kb was detected (Fig. 4). The 



difference between the 2.4-kb cDNA clone APGl and 
the 3.8-kb RNA species detected in the Northern blot 
is probably due to the continuation of the 3 'UTR down- 
stream of the end of clone APGl. 3 '-RACE from intes- 
tine RNA using oligonucleotides from clone APGl (oli- 
gonucleotide primers AP26G, see above, and AP26K 5'- 
GTCTGGCTTTGGCACTCTCTGC-3') revealed a PGR 
product of approximately 2.0 kb, which corresponds to 
a mRNA length of 3.8 kb, compatible with the results 
of the Northern blot analyses (data not shown). The 
highest level of expression was observed in small intes- 
tine, but this gene is also expressed in human adult 
heart, placenta, lung, th\Tnus, and prostate and in fetal 
brain and liver. Another weakly hybridizing mRNA 
species of 2.0 kb was also observed in several tissues. 
This could be due to alternative splicing, utilization of 
different transcription start sites and polyadenylation 
signals, overlapping transcripts, or, most likely, cross- 
hybridizing transcripts with sequence homologies with 
TMPRSS2. A human actin probe was used to control 
the amount of RNA loaded (data not shown). The ex- 
pression of the TMPRSS2 gene appears to be develo- 
pentally regulated since there is strong expression in 
fetal brain but very little expression in adult brain. In 
addition, in the lung, expression is high in the adult 
tissue but low in the fetal tissue. 

Type II Transmembrane Protein 

Protein prediction programs, which predict trans- 
membrane domains, including httpy/ulrec3.unil.ch/ 
soft ware/TMPRED_form. html (Hofmann and StofFel, 



Trans 
nMmbrano 



ExtracoUular 



Protaaso 



Cytoptaainlc 




COOH 



^G. 6. Schematic representation of the different domains of TMPRSS2. Numbers correspond to codons of the full-length cDNA shown 
tt^* 1. For description of the domains see text. 



314 



PAOLONI-GIACOBINO ET AL. 



y u 



§ S u 
Q o a 

M CO CO 




mm '^oofNr-*Hinv*oor*oocN 

r-i VO f-IHr4CNOrH004fH 



m eo 



CO 
CO 
Of 

Oi •-4 



to xi 



CO in r» r* 

m m fH CO 

VA V0 0\ ^ o 

P* O <*1 CA 

*H o \i> m cr\ 

O Ot 9 »J O 



to 



3 S 3 



50 p so 

ao o p 

01 > 

m > > 



CQ a Q 
V o u 



9 S 



Cu P* O* . . 
O O O l« o 

111** 
• * • * ( 



»-i M tm 



M «4 >• 



S 3 S S § 

o) n tn 0 o 

e 8 e c e 
^ I £ 

lit 

3 S 
S> g S 

P4 M H 

a Q o 

8 8 6 



«9 



^ §2 e E? 



Pi Pi M 

O O O N 

o) 01 a p9 

o 5 3 M 

> 5 > M 

■J J .13 iJ 



g! S g e 



f-l » « r4 t-l r« 

l*llll*t 

« CI tn m m v «n 




d d a d 

•vl -v^ -vi 

e ^ ^ JO ja 

• o o O Q 

u u u 

P 0 p 9 

• * * • 



o w v> w w 






'.IT 



CLONING AND MAPPING OF TMPRSS2 GENE 



315 



S g H 

Q f* ^ 





o 

a 

s 



o 

M 

S 



9S 



1 



a M t- ta II 



IB llliel lltX it«4«« 

! ^'S S: C a*£ 9 e j'K s s. e e 



i 




ill" 

S ^ ^ S S E :! g § S :j E s & S a 



CO 





o o 



8 



•J .4 .J «p« iJ 
to o> n C9 o o 

lliigi 

9C 3B $3 S5 €S 



_ at « o O 

£ S S 3 3 

■a M J J J 

CO to O IB fA 

o o ^ 



5 3 




3 c'5 £ 



W ^ W 1* 

M *• •< n ca 

S3 33 3 

s 3 S s 

3 S :| 3 12 

M M M M n 

s 




3 

3 3 e e 

^ M > 

In 



m m ^ m at ta ei e«««« 




8 c 6 ?e e 

S 3 9 s s 

O 9 S 

(OB" 

Hi 

•< < ^ 
P 3 9 3 

*j n ea 0* M 

Sa 5 X Bh a 
B> B> O fib Ba 

i 9 9 ? 3 

O O O Q Ch 




•-• 

E S3 ^ 

At M H O 9 H 

^ M H O O g « 



3 a ij 
8 a tr <* 

iai§ 

• • • > 

E S S S 

*d .4 iJ 

i SB e 

C «k k aJ 

S 9 3 3 




M 

5 
8. 



ca X »• Es II 

• S 8 




» o o •« 



16 



PAOLONI-GIACOBINO ET AL. 



LDL-Receptor 
first repeat 

Pro-peptide LDLR 
homology domain 




dyllO 



***** 




7^' 



Ji''l993),sugge 
^r^ere hydrop 



t^dornain (Fig 
&not preceded 
^findings are 



^jbrane protei 

^il993). Thes. 
:|^rpolypeptide ^ 
•iL similar to th 
P^^1988; Tsujie. 
^7 for cell grovi 
r^phology (Kui 
{a mechanisms 



iv. LDLRA Don; 

$ In additio 
5 RSS2 contai) 
•I. (low-density 
ii-from CysllS 
motif (PDOC 
P' prodoc-entry 
S sity lipoprot* 
¥ successive su 
;:. LDLRA dom: 
tains 6 disul 
126, 133, 13S 
t- have been fo 
proteins, inc 
•^sophila put a 
_/i kinase comp 
I C6. C7, C8, 
i integral men 
1995). The ai 
j-; domain of T 

V shown in Fi;^ 
domain and i 
of the LDLR 

V form the bini 
i*^- residues betv 
|rimportant f« 
fluCharged sequ 
f-1987; Mahle^ 

:M W^e SRCR D 

f An SRCR d 
[tified in TMF 
fpRCR domair 
&nd rich in c 
perived from : 
proteins reve 
gfjesidues (Res 
[domains are r 



ir. 




&PIG. 7. (a) R 
gbiie the TMPR 
^t»tease domain 
|ia296. blue; Af 
[Lahown in red. 



t«f - 

•a: 



(3-' 



i4f 



CLONINO AND MAPPING OF TMPRSS2 GENE 



317 



U: i993)» suggested that amino acids 84-106 of TMPRSS2 
^were hydrophobic and likely to be a transmembrane 
.^•domain (Figs. 1 and 5). This hydrophobic sequence is 
-pot preceded by a recognizable leader sequence. These 
^findings are compatible with a type II integral mem- 



^-brane protein in which the amino- terminus is at the 
^cytoplasmic side of the membrane (Parks and Lamb, 
•^4993)- These features (a type II integral membrane 
^ipolypeptide with an extracellular protease domain) are 
tH^similar to those of mammalian hepsins (Lreytus et aL, 
?^1988; Tsuji et aL, 1991). This latter protein is important 
for cell growth and maintenance of normal cell mor- 
phology (Kurachi et aL, 1994); however, the underlying 
mechanisms for the biological activities are unknown. 

t\LDLRA Domain 

V In addition to the transmembrane domain, TMP- 
\ RSS2 contains a protein motif of the so-called LDLRA 
lOow-density lipoprotein receptor A) domain extending 
l^-from Cysll3 to Cysl48 (Figs. 1 and 5). This structural 
^ motif (PDOC00929; http^/www.expasy.ch/cgi-bin/get- 
^ prodoc-entry?PDOC00929) was found in the low-den- 
sity lipoprotein receptor gene, which contains seven 
^ successive such domains (Sudhof aL, 1985). A typical 
LDLRA domain is about 40 amino acids long and con- 
tains 6 disulfide-bound cysteines (cysteines 113, 120, 
S 126, 133, 139, and 148 in TMPRSS2). Similar domains 
5 have been found in both extracellular and membrane 
u; proteins, including the VLDL receptor; gp330; Dro- 
^jisophila putative vitellogenin receptor; human entero- 
1 kinase complement factor I; complement components 
.;:C6, C7, C8, and C9; perlecan; PKDl; and vertebrate 
-0 integral membrane protein DGCR2/IDD (Daly et aL, 
4 1995). The amino acid comparison of the single LDLRA 
V domain of TMPRSS2 with other similar domains is 
,3:^ shown in Fig. 6a. The predicted 3D structure of this 
-;;.domain and its comparison with the first such domain 
/of the LDLR is shown iri Fig. 7a. The LDLRA domains 
.Oform the binding site for LDL and calcium; the acidic 
^^;residues between the fourth and the sixth cysteines are 
A::iinportant for high affinity-binding of positively 
^charged sequences in LDLR ligands (van Driel et aL, 
1987; Mahley. 1988). 

SRCR Domain 

An SRCR domain (Resnick et aL, 1994) was also iden- 
ed in TMPRSS2 extending from Vall49 to Leu242. 
SRCR domains are approximately 100 amino acids long 
d rich in cysteine. The overall consensus sequence 
erived from more than 40 such domains from different 
n>teins revealed a consensus sequence at 41 of 101 
idues (Resnick et aL, 1994). Two groups of SRCR 
Qi£uns are recognized, group A and group B, differing 




in the number of conserved cysteines. The SRCR do- 
main of TMPRSS2 contains the pattern compatible 
with group A SRCR. The sequence homology to differ- 
ent examples^ of group A SRCR domains is shown in 
Fig. 6b. The SRCR domains were first found in type I 
macrophage scavenger receptor (Freeman et aL, 1990) 
but subsequently in many other sequences (for a com- 
prehensive list, see Resnick et aL, 1994). The SRCR 
domain is reminiscent of but different from immuno- 
globulin domains. Proteins with SRCR domains are ei- 
ther at the cell surface or secreted into plasma or other 
body fluids. Some proteins such as the WC 1 antigen or 
M130 contain nine or more such domains while others 
such as the MSR (macrophage scavenger receptor type 
I) and the secreted CFl (complement factor 1) or 
cyclophilin C contain only one domain. The biochemical 
fimctions of the SRCR domain have not been estab- 
lished with certainty; however, most of these domains 
are involved with binding to the cell surface of extracel- 
lular molecules. 

Protease Domain 

The most striking feature of the TMPRSS2 predicted 
polypeptide is its similarity with members of serine 
protease family of proteins. The serine protease domain 
extends from amino acid residue Arg255 to the car- 
boxyl-terminus of the predicted polypeptide. There is 
approximately 45—55% identity with several members 
of the serine protease family; the best similarities are 
with human hepsin (X07002), human enterokinase 
(P98073), and human kallikrein (P03952). The features 
of the protease domain of TMPRSS2 are compatible 
with the Si family of the SA clan of serine-type pepti- 
dases as characterized by Rawlings and Barrett (1994). 
The prototype of this family is chymotrypsin and the 
3D structure of some of its members has already been 
resolved. For a comprehensive list of the Si serine-type 
peptidases see SWISS-PROT (http://www.expasy.ch/ 
cgi-bin/lists?peptidas.txt). TMPRSS2 exhibits conser- 
vation of serine protease sequence motifs (Fig. 6c); in 
particular, the active site residues can be identified as 
His296, Asp345, and Ser441. TMPRSS2 is predicted to 
cleave after Lys or Arg residues since it contains 
Asp435 at the base of the specificity pocket (SI subsite) 
that binds to the substrate. The predicted 3D structure 
of the protease domain of TMPRSS2 is shown in Fig. 
7b. The protein model was built using the SWISS- 
MODEL server for automated comparative protein 
modeling (Peitsch, 1995, 1996) as described under Ma- 
terials and Methods. It is of interest that TMPRSS2 
is highly homologous to hepsin, another protease that 
contains a transmembrane domain and is thus a type 
II integral membrane protein with its protease domain 



G. 7. (a) Ribbon model of the LDLRA domain of TMPR5S2. The NMR structure of the LDL receptor A domain is depicted in blue 
e the TMFRSS2 LDLRA homology domain is shown in red. The three disulfide bonds are shown in yellow, (b) Ribbon model of the 
.tease domain of TMPRSS2. The full protein structure is depicted as a gray ribbon, while the active sites are shown with colored residues 
^96, blue; Asp345, red; Ser441, green). The side chain of Asp435, which determines the I^ys/Arg specificity of the TMPRSS2 proteasje, 
Own in red. The three disulfide bonds are depicted in yellow, while two iree cysteines are shown as orange bars. 




liable Copy 




318 



PAOLONI-GIACOBINO ET AL. 



in the extracellular space (Karachi et aL, 1994; Leytus 
et aL, 1988; Tsuji et aL, 1991). TMPRSS2 contains nine 
conserved cysteine residues which by homology to other 
proteases most likely form the following intrasub- 
unit disulfide bonds Cys826-Cys842, Cys926-Cys993, 
Cy6957-Cys972, and Cys983-Cysl011 and the inter- 
Bubunit disulfide bond involving Cys758-Cys912 which 
probably joins the catalytic protease subuhit with the 
nonprotease part of the polypeptide. The protease do- 
main does not contain potential N-glycosylation sites 
while the remainder of the predicted polypeptide con- 
tains two such potential sites (N213» in the SRCR do- 
main, and N249). The amino-terminal He of the prote- 
ase domain is preceded by Arg in the context of a pep- 
tide sequence Arg-He-Val-Gly-Gly (RIVGG), which is 
typical for the proteolytic activator site of many serine 
protease zymogens (Rawlings and Barrett, 1994). The 
potential cleavage between Arg and lie, which would 
be similar to the activation mechanism of other serine 
protease zymogens, would convert TMPRSS2 to an ac- 
tivated form consisting of a nonprotease and a protease 
catalytic subunit linked by a disulfide bond that most 
probably involves Cys758 and Cys912. 

DISCUSSION 

In this paper we describe the cloning, chromosomal 
mapping, and initial characterization of a novel gene 
that maps on human chromosome 21q22.3 and encodes 
a polypeptide with multiple recognizable domains, 
namely LDLRA, SRCR, and serine protease domains. 
In addition, the presence of a transmembrane domain 
and the absence of a signal peptide suggest that this is 
a type II integral membrane protein. More biochemical 
experiments are necessary to further characterize the 
cellular localization of this protein and its physiological 
function. The biochemical events for the activation of 
the probable serine protease activity are unknown but 
are likely to be similar to those described above. It is of 
interest that the predicted TMPRSS2 protein contains 
additional domains (LDLRA and SRCR) that are poten- 
tially involved in binding with extracellular molecules 
or the cell surface. The molecules that are cleaved by or 
that bind to TMPRSS2 are unknown. There are several 
tissues that are shown by Northern blot analysis to 
express the TMPRSS2 gene. The site of the strongest 
expression is the small intestine; however, other tis- 
sues including heart, lung, and liver also showed a sig- 
nificant amount of TMPRSS2 mRNA. The fiinction of 
this protein in these tissues remains elusive. 

Are there any monogenic disorders associated with 
the TMPRSS2? Several monogenic phenotypes due to 
mutations in unknown genes have been mapped by 
linkage analysis to chromosome 21q22.3; these include 
APECED (Aaltonen et aL, 1994; OMIM 240300), an 
autoimmune disorder, ' two forms of autosomal reces- 
sive deafiiess (Bonne-Tamir et aL, 1996; Veske et aL, 
1996; OMIM 601072); Knobloch syndrome (Sertie et aL, 
1996; OMIM 267750); one locus for manic depressive 
illness (Smyth et aL, 1997; OMIM 125480); and one 



locus for holoprosencephaly (Muenke et aL, 
OMIM 236100). All of these phenotypes are mapn 
more distal to TMPRSS2, and it is therefore unlil^ 
that TMPRSS2 is a candidate gene for any of thl^ 
disorders 

Many human disorders are due to deficiency of otS 
serine proteases. For example, deficiencies of coa 
tion factors such as Factor XII (OMIM 234000). Fa' 
X (OMIM 227600), Factor IX (OMIM 306900), and Pi 
tor VII (OMIM 227500) belong to these disorders. Ad' 
tional examples of such disorders are enterokinase 
ficiency (Hadom et aL, 1969; OMIM 226200), tryp 
gen deficiency (Townes, 1965; OMIM 276000), 
hereditary pancreatitis due to mutations in the catio 
trypsinogen gene (Whitcomb et aL, 1996). The genei^ 
tion of mice with targeted disruption of the moiise 
TMPRSS2 gene will enhance our understanding oTSS 
function of this gene and will provide candidate phenyl 
types for further investigation. 

Is the overexpression of three copies of the TMPRSS2^ 
involved in one of the phenotypes of Down syndrome? 
TMPRSS2 maps outside the so-called Down syndrome^ 
critical region (DSCR; between markers D21S17 aiidl 



ETS2), triplication of which is associated with many^ ^^J *^' 



con 



phenotjTJes of Down syndrome (Delabar et aL, 1993)!^ 
However, the existence of a single DSCR has recently^ 
been challenged since rare patients with proximal tri^? 
somy 21 not including the D21S17-ETS2 region dii^; 
played some of the phenotypes of Down syndrome (Kpr-^ 
enberg et aL, 1994). In addition, a wider region fix>m;| 
D21S17 to and including MXl was associated with sev^ 
eral phenotypes, including the heart defect and somij^ 
dysmorphic features of the syndrome (Delabar et al,^ 
1993; Korenbergci aL, 1994). Since the TMPRSS2 gene,^ 
is within this interval it is formally a candidate fpjy^ 
some phenotype(s) of Down syndrome. Transgenic mice^^ 
that overexpress the murine extracellular protein urp:?^ 
kinase-type plasminogen activator have been shown^ 
to exhibit abnormal phenotypes (learning disabilities)^ 
(Meiri et aL, 1994). The study of transgenic mice thatj 
overexpress the murine homologue of the human 
RSS2 gene may contribute to the understanding 
potential involvement of this gene in the pathogenesifl^ 
of Down syndrome. A mouse model with partial trisomy^ 
16 (which corresponds to a partial human trisomy 21^ 
from APP to MXl) has recently been made (Reeves 
aL, 1995). It would be of interest to know if the murine 
homologue of the TMPRSS2 gene is included in tt 
triplicated part of mouse chromosome 16 



ACKNOWLEDGMENTS 




•'Chun 



''Freer 



tado) 



We thank P. de Jong Tor the HC21 -specific cosmid 
LL2INC02-Q, D. Patterson for the chromosome-21-3pecific somi 
cell hybrids, and H. S. Scott for critically reading the manuscrij 
This study was supported by Grant 31-40600.94 from the 
FNRS. the European Union Grants GENE-CT93-0015 and '? 
970302, and funds from the University and the Cantonal HospK 
of Geneva. 



^166 



.at 




liable Copy 




CIX)NING AND MAPPING OF TMPRSS2 GENE 



319 



REFERENCES 



^Itonen* J.. Bjorses, P., Sandku^l, Perheentupa, J., and Pelto- 
^exi* (1994). An autosomal locus causing autoimmune disease: 
L>A^^iininune polyglandular disease type I assigned to chromosome 
Nature Genet, 8: 83-87. 

jtechul, S. F.. Gish. W., Miller. W., Myers, E. W., and Lipman, D. J. 
£{l990)> Basic local alignment search tool. J, Mol, Biol. 216: 403— 
WlO. 

Jionarakis, S. E. (1993). Human chromosome 21: Genome mapping 
^and exploration, circa 1993. Trends in Genet. 9: 142—148. 

jiin^-Tamir. B., DeStefano, A. L.. Briggs, C. E.. Adair. R., Franklyn, 
S,^ Weiss, S., Korbstishevsky, M.. Frydman. M., Baldwin, C. T.. 
^and Farrer, L. A. (1996). Linkage of congenital recessive deafness 
fkigene DFNBIO) to chromosome 21q22.3. Am. J. Hum, Genet. 58: 
^: 1254 -1259. 

buckler, A. J., Chang, D. D., Graw, S. L., Brook, J. D., Haber, D. A., 
Sharp, P. A., and Housman, D. E. (1991). Exon amplification; A 
iistrategy to isolate mammalian genes based on RNA splicing. Proc. 
%Natl Acad. Sci. USA 88: 4005-4009. 

^Chen. H.. Chrast, R., Rossier, C, Morris, M. A., Lalioti, M. D., and 
^'^Antonarakis, S. £. (1996). Cloning of 559 potential exons of genes 
of human chromosome 21 by exon trapping. Genome Ees. 6: 747- 



760. 



tCheng, J. F., Boyartchuk. V., and Zhu, Y. W. (1994). Isolation and 
mapping of human chromosome 21 cDNA: Progress in constructing 
^c a chromosome 21 expression map. Genomics 23: 75-84. 

^Chumakov, J., Rigault, P., Guillou, S., Ougen, P., Billaut, A., Guas- 
I^Jh coni, G., Gervy, P., LeGall, I., Soularue, P., Grinas, L., Bougueleret, 
^.» L., Bellanne-Chantelot, C, Lacroix, B., BariMot, E., Gesnouin, P., 
^j Pook, S., Vaysseix, G., Frelat, G., Schmitz, A., Sambucy, J. L., 
1^^; Bosch, A., Estivill, X., Weissenbach, J.. Vignal, A., Riethman, H., 
"^^-Cox, D., Patterson, D., Gardiner, K., Hattori, M.. Sakaki, V., Ichi- 
^•kawa, H., Ohki, M., Le Paslier, D.. Heilig, R., Antonarakis, S. E., 
and Cohen, D. (1992 >. A continuum ofoverlapping clones spanning 
the entire chromosome 21q. Nature 359: 380-386. 

S?Church, D. M.. Stotler, C. J., Rutter, J. L., Murrell, J. R., Trofatter. 
4%^ J. A., and Buckler. A.J. (1994). Isolation of genes from complex 
^'sources of mammalian genomic DNA using exon amplification. Na- 
M ture Genet. 6: 98-105. 

^:Daly, N. L,. Scanlon, M. J.. Djordjevic, J. T., Kroon, P. A., and Smith. 

Ji/. R, (1995). Three-dimensional structure of a cysteine-rich repeat 
Wi.' from xhe low-density lipoprotein receptor. Proc. Natl. Acad. Sci. 
yK' USA 92: 63334-6338. 

^labar, J. M.^ Theophile. D.. Rahmani, Z., Chettouh, Z.. Blouin. 
I^J. L., Prieur, M., Noel. B., and Sinet, P. M. (1993), Molecular map- 
l^^^ping of twenty-four features of Down syndrome on chromosome 
'^§;21. £ur. J. Hum. Genet. 1: 114-124. 

Sprwinga, H. L„ Toji, L. H., Kim, C. H., Greene, A. E., and Mulivor, 
^rR. A (1993). NIGMS human/rodent somatic cell hybrid mapping 
impanels 1 and 2. Genomics 16: 311-314. 

fpstein, C. J. (1989). Down syndrome, trisomy 21. In "The Metabolic 
Jis of Inherited Disease** (C. R. Scriver, A. L. Beaudet. W. S. Sly. 
gand D. Valle. Eds.), pp. 291-326. McGraw-Hill. New York. 

!man, M., Ashkenas, J., Rees, D. J, G., Kingsley, D. M., Copeland, 
[N. G., Jenkins, N. A., and Krieger, M. (1990). An ancient, highly 
^oserved family of cysteine-rich protein domains revealed by clon- 
SO'^S type I and type Il-m urine macrophage scavenger receptors. 
Natl. Acad. Sci. USA 87: 8810-8814. 

Jadom, B.. Tarlow, M. J.. Lloyd, J. K., and Wolff, O. H. (1969). Intes- 
gtinal enterokinase deficiency. Lancet It 812—813. 

[ofiaann, K., and StolTel, W. (1993). Tmbase— A database of mem- 
^>'ane spanning proteins segments. Biol. Chem. Hoppe-Seyler 347: 
^66. 

snberg, J. R., Chen, X. N., Schipper, R„ Sun, Z., Gonsky, R.. Ger- 
^^ehr. S., Carpenter, N., Daumer, D., Dignan, P., Disteche, C, 
j;praham, J. M., Hugdins, L.. McGillivray, B., Miyazaki, K-, Oga- 
iwara, N., Park, J. P.. Pagon, R., Pueschel, S., Sack, G., Say, B., 



Schuffenhauer, S.. Soukup, S., and Yamanaka, T. (1994). Down 
syndrome phenotype: The consequences of chromosomal imbal- 
ance. Proc. Natl. Acad. ScL USA 91: 4997-5001. 

Kurachi, K.. Torres-Rosado. A., and Tsxyi, A. (1994). Hepsin. Methods 
Enzymol. 244: 100-114. 

Leytus, S. P., Loeb. K R, Hagen, S. F., Kurachi, K., and Davie, E. W. 
(1988). A novel tfypsin-like serine protease (Hepsin) with a puta- 
tive transmembrane domain expressed by human liver and hepa- 
toma cells. Biochemistry 27: 1067-1074. 

Lucente, D.. Chen, H. M.. Shea, D., Samec. S. N., Rutter, M., Chrast, 
R., Rossier. C. Buckler. A.. Antonarakis, S. E., and McCormick. 
M. K. (1995). Localization of 102 exons to a 2.5 Mb region of chro- 
mosome 21 involved in Down syndrome. Hum. Mol. Genet. 4: 1305- 
1311. 

LUthy, R., Bowie, J. U., and Eisenberg, D. (1992). Assessment of 
protein models with three-dimensional profiles. Nature 356: 63- 
85. 

Mahley, R. W. (1988). Apolipoprotein E: Cholesterol transport pro- 
tein with expanding role in cell biology. Science 240: 622-630. 

Martin, C. H., Bondoc. M. M., Chiang. A., Cloutier, T.. Davis, C. A., 
Ericsson, C. L., Jaklevic, M. A., Kim, R. J., Lee, M. T-, Li, M., May- 
eda, C. A., Steiert-El Kheir, A., and Palazzolo. M.J. (1994). Se- 
quencing of the MXl region of human chromosome 21. [Unpub- 
lished! [http://www2.ncbi.nlm.nih.gov/cgi-bin/genbank7L35675] 

Meiri, N.. Masos.T., Rosenblum, K., Miskin. R., and Dudai, Y. (1994). 
Overexpression of urokinase -type plasminogen activator in 
transgenic mice is correlated with impaired learning. Proc. Natl. 
Acad. Sci. USA 91: 3196-3200. 

Muenke. M.. Bone, L. J., MiUhell, H. F.. Hart, I., Walton, K., Hall- 
Johnson, K., Ippel, E. F., Dietz-Band, J., Kvaloy, K., Fan, C.-M., 
Tcssier-Lavigne, M., and Patterson, D. (1995). Physical mapping 
of the holoprosencephaly critical region in 21q22.3, exclusion of 
SIM2 as a candidate gene for holoprosencephaly. and mapping of 
SIM2 to a region of chromosome 21 important for Down syndrome. 
Am. J. Hum. Genet. 57: 10747-1079. 

Parks, G. D., and Lamb, R. A. (1993). Role of NH2-'terminal positively 
charged residues in establishing membrane protein topology. J. 
Biol. Chem. 268: 19101-19109. 

Patterson, D., Rahmani, Z., Donaldson, D.. Gardiner, K., and Jones, 
C. (1993). Physical mapping of chromosome 21. Prog. Clin. Biol. 
Res. 384: 33-50. 

Peitsch, M, C, (1995). Protein modelling by e*mail. Bio /Technology 
13: 658-660. 

Peitsch. M. C. (1996). ProMod and Swiss-Model: Internet-based tools 
for automated comparative protein modelling. Biochem. Soc. 
Trans. 24: 274-279. 

Peterson, A., Patil, N., Robbins, C. Wang, L., Cox, D. R.. and Myers, 
R. M. (1994). A transcript map of the Down syndrome critical re- 
gion on chromosome 21. Hum. Mol. Genet. 3: 1735-1742. 

Rawlings. N. D., and Barrett. A. J. (1994). Families of cysteine pepti- 
dases. Methods Enzymol. 244: 19-61. 

Reeves, R. H., Irving, N. G., Moran. T. H., Wohn, A., Kitt, C., Sisodia, 
S. S., Schmidt, C, Bronson, R. T.. and Davisson, M. T. (1995). A 
mouse model for Down syndrome exhibits learning and behaviour 
deficits. Nature Genet. 11: 177-184. 

Resnick, D., Pearson, A., and Krieger, M. (1994). The SRCR super- 
family: A family reminiscent of the Ig superfamily. Trends Bio- 
chem. Sci. 19: 5-8. 

Sertie, A. L., Quiraby, M., Moreira, £. S., Murray, J., Zatz. M., Anto- 
narakis, S. E.p and Passos-Bueno, M. R. (1996). A gene which 
causes severe ocular alterations and occipital encephalocele 
(Knobloch syndrome) is mapped to 21q22.3. Hum. MoL Genet. 5: 
843-847. 

Sippl. M. J. (1993). Recognition of errors in three-dimensional struc- 
tures of proteins. Proteins Struct. Fund. Genet. 17: 355-362. 

Smyth, C, Kalsi, G., Curtis, D., Bryi\jolfsson J., O'Neill, J., Rifkin, 
L„ Moloney, E., Murphy, P.. Petursson. H., and Giirling. H. (1997). 
Two-locus admixture linkage analysis of bipolar and unipolar af- 




lable Copy 




320 



PAOLONI-GIACOBINO ET AL. 



recti ve disorder supports the presence of susceptibility loci on chro- 
mosomes llpl5 and 21q22. Genomics 39: 271—278. 

Sadhof» T. C. Goldstein. J. L., Brown. M. S., and Russel. D. W. 
(1985). The LDL receptor gene: A mosaic of exons shared with 
different proteins. Science 228: 815—822. 

Tassone, F.. Xu, N. X.. Wade, H., Weissman. S., and Gardiner. K. 
(1994). High density transcriptional mapping of chromosome 21 
by hybridization selection. Am. J, Hum. Genet. 55: 272A. 

Townes, P. L. (1965). Trypsinogen deficiency disease. J, PediaL 66: 
275-285. 

Tsi^i, A.. Torres-Rosado. A., Arai. T.. Le Beau. M. M.. Lemons. R. S,. 
Chou, S. H.. and Kurachi. K. (1991). J, Biol. Chem. 266: 1694S- 
16953. 

van Driel. I. R., Goldstein. J. L., Sttdhof, T. C.. and Brown, M. S. 



(1987), First cysteine*rich repeat in ligand-binding domain of 
density lipoprotein receptor binds Ca*" and monoclonal anti 
but not lipoproteins. J. Biol Chem. 262: 17443-17449. 

Veske. A,, Oehlmann. R., Younus. F., Mohyuddin, A., Muller<i^ 
sok. B., Mehdi. S. Q.. and Gal. A. (1996). Autosomal 

recessive 

syndromic deafness locus (DFNB8) maps on chromosome , 
a large consanguineous kindred from Pakistan. Hum. Afol 
5: 165-168. 

Whitcomb. D. €.. Gorry, M. C. Preston. R.A.. Furey, W., 
heiraer. M. J.. Ulrich, C. D., Martin. S. P.. Gates. L. K., Jr. 
S. T.. Toskea. P. P., Liddle. R.. McGrath, K., Uomo; G.. Post, J 
and Ehrlich, G. D, (1996). Hereditary pancreatitis is caused 
mutation in the cationic tr>'psinogen gene. Nature Genet. 14» 
145. 







Exhibit 32 



397 



New assay technologies for high-throughput screening 

Lauren Silverman, Robert Campbell and James R Broach* 



The use of htgh*throughput screening for early stage dnjg 
discovery cmposes several constraints on the format of assays 
for therapetitic targets of interest. Homoger^eous celHree 
assays based on energy, transfer, fluorescence polarization 
spectroscopy or fluorescence correlatbn spectroscopy 
provide the sensrtivtty, ease, speed and resistance to 
interference from test compounds needed to function in a 
high-throughput screening mode. SimDaily, r>ovel ceU*based 
assays are now being adapted for high-throughput screening, 
providing for in aitu analysis of a variety of bioIogicaJ targets. 
Ftrfaily, recent advances in assay miniaturization maifc a 
trartsition to ultra high-throughput screening, ensuring that 
identification of lead compounds wiU not be the rate-limiting 
step in finding new drugs. 

Addreftsas 

Cadus Pharmaceutical Corporatiorv 777 Old Saw Mill River Road, 
Tanytown, NY 1 0591 -670S, USA 

'Depajtment of Molecular Biology, Prmceton Uiworsity, Princeton, 
NJ 08544, USA; e-mafl: jbroachiSmc^ecnjIar.pnnc0ton.sdu 

Current Opinion in Chamtcal Biology 1 998, 2:397-403 

http-y/btomednoLcom/elecref/l 367S93 1 00300397 

e Current Biology Ltd ISSN 1367-9931 

Abbreviations 

CRE cAMP response elemmt 

FCS ' ftuorescsncQ corrslalion spectroscopy 

QFP . green ftuoreacerYt protein 

HTS high-throughput screening 



Introduction 

Continuing advances in molecular biology, human genetics 
and genomics have accelerated identification of the mech- 
anisms underlying a growing number of human diseases. 
This progress has increased the number of novel protein 
rargecs available for potential therapeutic intervention 
by drug treatment. Concurrently, novel approaches in 
combinatorial chemistry and expanded collections of 
natural products have dramatically increased the number 
of compounds that can be tested for activity against these 
targets. The confluence of these two trends towards more 
potendal targets and larger chemical libraries has gready 
stimulated adoption of high-throughput.screening (HTS) 
as the primary tool for early stage drug discovery. 

HTS is the process by which large numbers of compounds 
are tested, in an automated fashion, for activity as 
inhibitors or activators of a particular biological target, such 
as a cell surface receptor or a metabolic enxyme. Although 
any assay performed on the bench top can, in theory, 
be applied in HTS, conversion to an automated format 
imposes certain constraints that .affect the design of the 
assay in practice. Procedures that arc routine at the bench 



are often extremely difficult to automate. Also, the more 
steps required for an assay, the more difficult to automate 
the HTS. The ideal assay is one that can be performed in a 
single well with no other manipuladon other than addidon 
of the sample to be tested. 

A number of assay formats have been developed or 
modified over the past few years to conform to the 
constraints imposed by HTS. These assay protocols can 
be divided into two groups; cell-free assays that measure 
the biological acdvity of a relatively pure protein target and 
cell-based assays that assess the acdvity of a target, protein 
by monitoring a biological response of a cell in which the 
target protein resides. In cither case, the protocols require 
minimal cnanipuladons, can be performed robodcally in 
relatively small volumes, yield robust responses and are 
reladvely impervious to perturbadon by solvents and 
compounds used in drug screening. In this review we 
■ describe several of the more recendy developed or 
exploited assay protocols for HTS. 

Cell-free assays 

The primary goal in adapting cell-free assays to HTS is 
to minimize the number of steps required in setting up 
the assay and in detecting the acdvity, be it an enzymatic 
reacdon or the binding of two components. This goal has 
been met to a large extent by development of detection 
systems that do not require separation of the product of the 
reacdon from substrate* or from other components of the 
assay mixture. Earlier approaches to such homogeneous 
assay formats relied on proximity-dependent energy- 
transfer. The output of such assays derived from the 
signal enhancement generated by bringing a source and a 
distance-dependent amplifier close together. For example, 
the P-particles of a low-eneigy radionuclide attached to 
a ligand will sdmulate the fluorescent emission of a 
scintillant in a bead to which the ligand's receptor is 
attached [1,2]. More recendy, this detection method has 
been applied to enzymatic reactions, such as that catalyzed 
by topoisomerasc I [3]. As another example of energy 
transfer assay formats, the rare earth metal lanthanide, 
Eu^*-; when irradiated by light, can transfer its excitadon 
energy in a nonradiadve process to the fluorescent protein, 
allophycocyanin, if the two are in close proximity. This 
can occur when a £u^'*'-derividzed ligand binds to an. 
allophycocyanin-linked receptor [4,5] or a Eu^^'-derivitized 
and-phospho tyrosine antibody binds to a detector-linlted 
~pho3phorylated substrate of a tyrosine kinase such as sro 
[6*]. Use of dme resolved fluorescent procedures assessing 
emission at specific times following excicadon enhances 
the sensidvity of this technique, by reducing interference 
from background' fluorescence, from test compounds or 
from assay components [6*,7*]. Finally, enzymadc assays 
suitable for HTS and based on fluorescent resonant energy 



398 ComblnatoriaJ chemistry 



transfer becwccn two dtfferenc forms of green fluorescent 
protein (GFP) have recently been described I8*I. 

A number of tnvesctgacors have exploited fluorescence po- 
larization spectroscopy (FPS) as the basis for homogeneous 
HTS assays of both enzymatic and binding reactions. 
When fluorescent molecules in solution arc excited with 
polarized Jight, the degree to which the emitted light 
retains polarization depends on the extent to which the 
fluorescent molecule rotates during the interval between 
excicadon and emission. The rapid roiadon of small 
fluorescent molecules in soluuon results in substantial 
loss of polarizauon. If such small molecules bind to 
larger molecules, their rotadonaJ diffusion is reduced and 
the retenuon of polaiization is correspondingly increased. 
Thus, by measuring the relative intensity of emitted light 
in the planes normal and orchogorial to the plane of the 
incident polarized light, the extent of rotation of a target 
molecule, and inferentiall); the extent of binding of the 
target molecule to a larger component, can be calculated. 
For instance, fluorescent polarization has been used to 
detect the presence of specific drugs or hormones [9,10J, to 
assess antibody binding of fluorcscein-conjugaced pepddes 
{1 1) or to monitor DNA: DNA hybrid formadon [12 J. The 
recent availability of a 96-welI plate reader (13J with a 
high sensidvity to fluorescein and fluorescein conjugates 
has. allowed developnient of 96- we 11 based fluorescent 
polarizadoh assays; Such :high-throughput assays for src 
family tyrosine kinase acdvity [I4*h for binding /of 
phosphopeptidci to Src SH2 domains [15*), for interacdon 
between STAT I and an Y*interferon receptor-derived 
phosphotyrosine-coniaining pepdde [16»] and for speciiic 
protease acdviues (17,18»J have recendy been described. 
The sensidvity of fluorescence polarizadon, the ease and 
speed with which such assays can be run and the resistance 
of such assays to interference from absorpdve compounds 
commonly present in complex mixtures [IS*] make this 
procedure highly amenable to HTS. 

Fluorescence correlation spectroscopy, (FCS) represents 
another recendy developed detecdon format eminently 
suitable for HTS. FCS measures differences in physical 
states of a target molecule, such as bound versus free 
or cleaved versus intact, in a homogeneous mixture 
(19], SpecificaUy, FCS measures the burst of fluorescent 
emission of ,a molecule passing through a small volume of 
space, which is defined by a sharply focused laser beam. 
Small molecules diffuse through the volume rapidly and 
thus yield short bursts of light. Binding of these small 
molecules to larger molecules reduces their transladonal 
diffusion and correspondingly increases the durauon of the 
bursts of light. Deconvoluuon of the emission patterns 
in a sample . by appropriate software can yield the 
reladve amount of the bound and . unbound states of a 
fluorescently tagged ligand. This technology can therefore 
readily be applied to measure receptor— ligand interacdons, 
DNA-procein interacdons, nucleic acid hybrid formadon 
and certain enzymadc reacdons [20]. 



Cell-based assays 

Cell-based assays arc an increasingly attractive altcmauve 
to ta vuro biochemical assays for HTS. Such in x/rvo assays 
require an ability to examine a specific cellular process 
and a means to measure its output. For instance, agonist 
acdvarion of a cell surface receptor or a ligand-gated 
ion channel can elicit a change in the cranscripdon 
pattern of a number of genes. This ligand-induced 
alteradon in transcription can be readily captured by using 
gene fusions, in which a promoter element rcsporisivc 
to receptor acdvadon is fused to the coding region 
for an enzyme or protein whose levels can be easily 
measured. Appreciadon of the parucular signaling pathway 
associated with a specific receptor allows idendficadon of 
the appropriate transcripdonal response element required 
to detect a r^ponse. Figure I depicts a number of 
signal transducdon pathways, indicadng the transcripdonal 
response elements coupled to each pathway. Several 
reporter genes chat generate products that can be adapted 
to HTS format arc available [21,22]. These are listed in 
Table 1, with references to recent irmovadons in their use 
123»,24,2S,26»J. For instance, die recent report of novel 
fluorescent, cell-permeable substrates for ^-lactamase 
documents the use of p-laccamase to detect receptor 
acdvadon in single cells, making it an attracdve assay 
system for high density HTS [27**], 

While cell-based assays using reporter genes have proved 
effecdve as an HTS format, detecdng more immediate 
responses to target protein activation provides several ad- 
vantages, including shoner du radon of the assay and fewer 
false positives from nonspecific interacdons. As indicated 
in Figure 1, such cellular response dependent on acdvadon 
of a receptor include elevadon of a second messenger (for 
example, Ca?*, cAMP, inositol triphosphate), phosphory- 
ladon of an intermediate signaling protein, or subcellular 
translocadon of a signaling molecule. Recent advances in 
molecular biology and in instrumcntadon have made it 
possible to monitor these events in an automated format. 
For instance, the recent availability of a 96-well fluorescent 
imaging plate reader (Molecular Devices, Sunnyvale, 
California, USA) permits HTS of receptor acdvadon by 
monitoring Ca2* mobilizauon of ceils preloaded with a 
fluorescent calcium indicator, such as FLUO-3 (Molecular 
Probes, Eugene, Oregon, USA). In addidon, recombinant 
cells expressing a calcium-sensidve fluorescent protein, 
such as aequorin [28»J or a hybrid calmodulin-GFP protein 
(29»«J, obviate the need for preloading cells with dyes 
in order to detect calcium fluxes following stirouladon. 
A separate approach to detecdng early events following 
receptor sdmutadon involves examining relocalizadon of 
-specific components of the signal transducdon machinery. • 
For instance, MAP kinase (Figure 1) relocalizes from 
the cytoplasm to the nucleus within minutes following 
sdmuladon of an upstream G-protcin-coupled receptor 
[30,31]. Similady, fiarak ef aL [32«] fiave shown that 
recruitment of a 3-arTe3dr>-GFP fusion protein to the 
plasma membrane can be used to' monitor activadon 



New assay tachnoloslea for Mgh-throughput screantng SOvannan ef at 309 



Ftgura 1 



(a) 




<b> 



cAMP 



Protein kinasa A 



IP3 

4 

lCa=*l 



CREB 



CaM kinasa 



4 



CaJcineum 



4 



aSMasa 
^ v^oranuda 

Ik8-NFkB 
•"^ NFnS 



iy 



* Ras 

Raf 
MCK 
MAPK 
ELK 



CRE 



Nf=-AT 





NPkB 




SRE 



Foa. Jun 



API 



Gkvwtt 



SignaJ transduction pathways cofitmonfy used in mammaiian ceO-basad high-thnnighput aasaysu <a) Agonist-engagad sovan tranam w n bf une 
receptore ara functjonalty linked to the modulatton of aeveral wall charaeterizad enhancar/promotor atomenta, the cAMP raapense element (CRE). 
nuclear factor of activated T ceOa (NF-AT), NFicB, serum response element (SRE) and API (4S-49). Upon activation of a Qq. coupling receptor, 
adenylyl cydaae cs .stimulatod, producing increased concentrations of .intrBceQular cAMP, sttmutation of protein kinase A, phosphorylation of 
the CRE binding protein (CREB) and induction of promoters with CRE elementa. Q^^ coup&ng receptors dampen CRE activity by inhibition of 
the same signal transduction components. GgnCo u pfing receptors and some pairs etimubte phosphottpase C (PL^, and the generation of 
inositol trisphosphate (IP3) artd dtacytgfycerolibAG). A transient ftux tn intraeeltular catejiim promotaa induction of calcineurin and NF-AT, as welt 
as calmodulin (CaM)<l epsndant kinase and CREB. tnoreased DAG corwentrationa stimulatD pmtein kinase C (PKO artd er>do^mal/lysosomal 
acidic spbingomyelinasa (aSMase); while the aSMasa pathway is domirtant both induce degradation of the NFicB trthibitor IkB as «veU as NFkB 
actzvatton. By a poorfy urtderstood mechartism. IkB degradatkm may also be initiated through the MAPK (mitogen^actrvated proten- kinase) 
cascade (not shown). (b> Growth factor receptor (depicted by ellipoes) activation rosults In racruitinem of Sos (not shown) to the plasma 
msmbmna, where it stimulates Ras, which rocrutts the sertne/threenino kmase.Raf to the plasma membrsna. Once activated. Raf phosphorytatos 
MEK kinase, which phosphoiytatoi and aetivatea MAPK and the transcnptkMi factor ELK (Eta-lika protein, also known as p62 TCFt ttamary 
oomples factor 1]). ELK drives. transcription from pfomot sia with SRE elements, taaifing to synthesis of the transer^tion factors Fba ortd Jun, 
thai form a transcriptian complex capable of activating API sites. Seven transme mb r an e receptors also stimulate the MAPK pathway through 
3r sutMintts, most protnbly through phospholnbsitido 3 'kinase y (PI3Ky; not shown). 



of a number of different G-proccin-coupled receptors. 
Recent advances in mtcxoscopic imaging technology, in 
conjunccion wich software permicdng automated image 
recognition, provide a means to capture these events in 
a high -throughput mode. . 



Cell-based susays have significant advantages over ia xjitro 
assays. First; the starting material (the cell) self-replicates, 
avoiding the investment involved in preparing a purified 
target, in chemically modifying the target to suit - the. 
screen, and so on. Second, the taigets and ceadouts are ex- 



400 Combinatorial chemistry 



Tablo 1 



Reporter genee (source) 


Advantages 


Disadvantages 


References 


O-galactosidase 
(bacteriaO 


WeO characterized; stable, tnexpensive 
substrates; hight/ sensitive fluorescent 
or chamiluminesceftt substrates avaflaUa; 
little interference from test compounds; 
simple readouts (readSy automated) 


Endogenous acthrity (mammalian 
cetls); tetramerie (rwn-linear 
response at low cortcentratior^ 


J23-50J 


Lucif erase 
(nrefly) 


CKmeric; high specifio activity; no 
endogenous activfty Oow background) 


Requires addition of cofactor 

Ouciferin) and presence of O2 - 
and ATP 


(23-J 


Alkaline 
phoaphataaa 
Ouanan placental) 


Secreted protein (avoids the rteed for 
membrane-permeable substrates); 
inexpensive colorimetrtc and highly 
sensitive (umineseent assays available 


Dtdogsnous activity in some caO 
types; optimal at pHM 


124,231 


^^actomasa 
(bacterial) 


Monomeric; highly sensitive 
fluaiogenie substrates dsacr&ed; 
no erKtogenous activity 


Memtarane-permeafale 
fluorescent 8ut»8tratas not readify ' 
avaSable 


[27-J 


GFP QeOyfish) 


Monomeric; no substrate needed, (no 
manipulations requzrad for asa^; no 
endogenous activity; multiple forms 
avaBable 


Relatively low spedfic activity 


[26\31^2] 



amined in a biological context that more faithfully mimics 
the normal physiological situation. Third, cell-based assays 
can provide insights into bioavailability and cytotoxidcy. 
Mammalian cells arc expensive to culture and difficult 
to propagate in the automated systems used for HTS, 
however. 

An alternative to mammalian cell based assays is to 
recapitulate the desired human physiological process in a 
micro-organism such as yeast (33]. For instance, signaling 
via human G-protcin-coupled receptors has been reconsti- 
tuted in yeast to yield a facile growth response or a reporter 
gfnc readout ([34,35J; Klein «r unpublished data). 
Similarly! mammxUian ion channels have been coupled 
to growth response in yeast [361, Also, protein-protein 
interactions, including RAS-RAF associadon [37J and 
tyrosine kinase receptor-Iigand binding [38], have been 
faithfully reproduced using the yeast two-hybrid system. 
Finally, many mammalian transcription factors operate in 
yeast, including glucocorticoid receptor [39,401 and the 
retinoic acid receptor and retinoid X receptor families of 
receptors (411. The case and low cost of growing yeast, 
their ready genetic manipulation, and their resistance to 
solvents make yeast an attractive option for cell-based 
HTS. 

Miniaturization - — 

Several factors are fueling efforts to increase the speed 
of HTS and decrease the volume of individual reactions 
within an HTS format. Split-bead synthesis (sec Note 
added in prooO> or other similar approaches to combi- 
natorial chemistry dramatically increases the number of 
compounds that can be produced in a library but do so at - 
the cost of quantity of materiaL In addition, the limited 
supply of existing compounds within chemical libraries' 



of pharmaceudcal companies, aqd the growing number 
of targets against which such compounds can be tested, 
motivate a frugal approach to u»b of those compounds. 
Finally, the reagent costs associated with HTS, when 
muldplied by the increasing numli^r of assays per run/ arc 
bccotning a significant cost of eariy stage drug discovery. 

In response to these exigencies,, a number of groups 
have begun to develop formats ' for very high density 
screening using very small assay Volumes. One approach 
involves reducing the well size anid' increasing the density 
of the assay plate but retaining the overall assay format 
used in current 96-weU based HTS. Densides of 6500 
assays in a 10 cm array have beca reported for cell-free 
enzyme based assays (42*1 and ^for ligand binding in 
cell based assays t43»»l. This approach of miniaturizing 
exisdng formats significandy increases the number of 
assays per plate and the overall ttiihpughput of the screen 
but is intrinsically limited by die physical coristraints 
of deUveririg small volumes to wells, and of detecting 
responses in a sensidve and dmely manner. Accordingly, 
novel formats have been developed that eschew the 
assay format based on wells. Ont approach uses glass 
chips containing microchannels in which reagents, target 
proteins and compounds are herded by electro kinedc flow 
controlled by electric potendals applied at the ends of the 
channels (44«J. A related approach attains high-throughput 
both of chemical synthesis and acdvity assessment by 
-parallel arrays of three-<iimensional channels in which 
flow is controlled, by miniature hydrostauc actuators [451. 
These approaches provide significant reducdon in the 
volume of assays and a corresponding savings in reagent 
costs over convendonal HTS [451, In addition, with 
further development in parallel processing in multiple 
chips, the number of assays performed in a given period 



New assay tecimolo^es for high- throughput screening Silverman ef a/. 401 



of dme can increase dnimactcaUy. This movement co 
miniamrizacton is likely co ensure chat the initial stage of 
drug discovery idencificadon of lead compounds will noc 
be the race-ttmiung seep in finding new drugs. 

Conclusions 

The lasc decade has witnessed the emergence across che 
piiarmaceutical industry of the 96-weil-based, roboucs- 
driven, high-throughput screening process as the primary 
tool for idendf^ng acdve compounds in the first stage 
of drug discovery. This program has dictated the format 
of che assays chat are used to assess the acdvides of tar- 
gets — enzymes, receptors, transporters and so on — chat 
underlie drug discovery in various therapeudc areas. 
A number of. such formats — resonant enerigy transfer 
and fluorescent polarizauon spectroscopy in cell-based 
assays — have gained widespread acceptance and growing 
incorporauon into high-throughput screening programs. 
The growing number of potential therapeudc targets, 
the increasing number of screenable conipounds, the 
acceleradng coses of screening and the increasing pressure 
to generate more lead compounds in a shorter dme all 
conspire to render even the new approaches inadequate for 
meedng the andcipatcd throughput requirements, how- 
ever. Thus, we are likely to witness a movement towards 
even greater screening throughput by miniaturization and 
increased reliance on robodcs. Whether a new standard 
format for screening emerges in the near future, or whether 
a variety of formats are pursued concurrendy remains 
to be seen. Nonetheless, we can andcipate that the 
exigencies of drug screening will modvate a condnued 
application of state-of-the-art technologies co the process 
of high-throughput screening. 

Note added in proof 

For a reference describing split-bead synthesis, see [53]. 



References and recommended reading 

Papon alt particular interest, pubfished witHn the anmiaJ period of review, 
have been'NghCghted as: 

• of special int&rast 

*• of outstanding Interest 

1. Udenfriend S, Gert>er UX Brink U Spactor 8: Sdntiltotton 
prozimtty aasay: • sensitive end conttmious Isotropic 
method for monltortno tigand/receptor and sntfgen/entfbedy 
Interactions. Anat Biochmm 198a, 101:4O4'«OO. 

X Cook N: SdntiUatlon prozifnity assay: e versatile hls^- 

throuohput screening technology. Dmg Ditco¥ Tottay 1096. 

laez . . • 

3. Lemer C. SalU A: Scintillation prwJmity assay for human ONA 
topolsomersse I using recombinant triotinyhfuslon protein 

produced in bocutovfrus-lnfected Insect ceils. Anai Biochem 

190e, 240:189-108. 

4. Mathls Q: Rare earth cryptetes end homogeneous 
ftuorimunoassays with human sera. CSn Oham 1993, 39:1953- 

1050. " . . 

6. Mattua G: F¥ebing metecuSar interactions with homegenaous 
techniques Iwsad cmi rare earth cryptstes and fluarttscenca 
energy transfer. Clin Chmm 1805, 41:1301-1307. 

6. Bminwaldsr A, Yarwood D. SUls M, Upson K: Measvrement 
• of tha protain tyrosine Unasa ecttvity of c*src using ttma- 



resohred lluorometry of europium chela te& Anal Biochom 

1998. 338:150-184. 
The authors dtfscrib o an assay method to evaluate the activHy of protein 
tyrosine kinases that uses ourophnn ch8lale4abel8d antHpho^shotyrosins 
antibodies to detect phosphata transf or to a polymeric substrate coatod onto 
mtcrotfter ptata .weOa. Using tima -re solved, dtssocxation-enhertced fluores- 
cence incraased sensitivity and reduced interference from test compounds. 

7. Oaarde W, Hunter T. Bredy K Murray B. Goldman M: 

• D e v e lo pm ent of • nonrodioecttve. time-resotvad ftuerescenco 
essay for tfui nneasurement of |un n-tervnirtal klitese activity. 
J Biomot Sam 1 097. 2:21 3-223. 

The authors descrte e nonrgffinaftive. MgMhroughput. txme-resofved ftu- 
ore scenes assay for jun kinase (iNIQ actMty using eure^um-tabeled anti- 
body thai is specific for amin^termiTulty phosphoiyl^ed The optimized 
euro^Kum^based assay is appr eam ately ' 18-fold mors senartrve than a simitar 
b aa ed JNK assay. 

8. Mitra R, Silva C Youvan O: FUiorescance resenanca energy 

• transfer between btua^mitting end red* shifted excitation 
derlvattvtts of the graen fluareseent protein. Gene 1 988, 
173:13-17. 

The authors report fluorescent resonance energy transfer (FRET) between 
two Enked variants of the green ftuoroseont protein, one of which is fused 
to the amino terminua of a fSejdble polypeptide Gnker eontaxntng a Factor X 
a protease deavage site while the seeond of which ta fused to the carboxy 
torminua of the potypep&fa. Claavago of the peptide Cntier witfi Factor Xa 
yi^ds a ma/ked decrease in energy transfer, malting ttiis a viable homoge- 
neous assay fomnat for proteases. 

9. Aucourturfer P. neudlwmme A, Lubochtnsky B: Fluorescertce 
polarisation Imrmmoasaay off estredloL Oiagn Clin tmmunoi 
1983, 1:310014. 

ia Eremin SA» GaSacfier G, Lotey H,' Smith OS. Landon J: SlrYgta- 

reagent potartzatfon fhtorolmmunoassay of rrtathomphetamlne 
In urina. Clin Cham 1987. 33:1803-1908. 

1 1. A-P, Herron JN: Use of synthetic peptides es trecer 
antigens In ftuarascenca poterizatfon immu n o ass ays of high 
molecular weight snalytaa^ Anal Cham 1093. 83:3373-3377. 

1 2. Murakami A, Nakaura M, NakatTaiji Y. Nagahara S. TrsrvOong 
Q, Makino K: Fluorascent-labeled oligonucleotide probes: 
detection of ftyt»rid.forfnatlon tn solution by ftuorascance 
potarftatlon spectroscopy. NudAdda Ran 1991. 1 9:4097-4 lOZ 

1 3. JoSey Consutting and Research Inc on the World Wide Web: 
http:/Awww.jaiIeyxom/. 

1 4. Seethata R, Menzel R: A homogeneous, ftuorascanea 

9 polarization assay for are-famtly tyrosine kirtasea. Anal Biochem 

1907. 383:21 0-318L 
The authors descrOm a homogeneous assay for the arc kinase family mem- 
bar, \ that oonsista of a fluoreacenytated peptide sutistrate for the kinase 
and an ant)-phos ph atyro olim antibody. 

15. Lynch B. Loiacono K. Tlong d Adams S, MacNeil 1: A 

• fluoresoence polarization based Src-SH2 binding essay. Anal 
Hibc/iem .1 997. 247:77-82. 

The authors describe an assay to detect compounds that interfere mth the 
bintSng of an 5H2 domain to ita phoaphbtyrosine peptide target The assay 
consists of (he ere SHi2 domain and a fhiorescartylatad, phosphotyrosine- 
containing heiapeptide to wfiich the arc SI-12 domain binda. Compounds 
that intarf em wim.this interaction are detected t»y a decrease in ftuerescenee 
polarization. 

18. WU P. Brasseur M. SchindJer U:,A hfgh-throushput STAT btndii>o 

• assay using flu oreseanca polarization. Ana/ eiocftem 1997, 
249:29-38. 

The authom describe on aaaay to detect .compounds that interfere with the 
interaction between STAT1, a transcription factor thai ia aetivatod upon Y;ir>- 
terf eron btraSng to its receptor, artd the phosphotyrosin o c on t ai iiing peptide 
derived from y^iterferon receptor with which it irtterssta. Birtcfing Is evaluated 
u»ig Ruoreacence polarizaiion. 

1 7. Schade S, JoCIey M, Sarauer B. Simonson L: B DlPY-etphe- 

casairv e pH*4rtdependefTt protein sutrstreta f or protaasa 
'* assays using ftuorascanea polartzatlorv Anal Biocham 1 906, 
. 243:1-7. 

*"V& l.evine L. Michener M, Toth M, 'Holwerda B: Ntoasurament of 

• specific protaasa activity utmzing nuorescenoa polarizetiorL 
Anal Biocham 1097, 247:83-88. 

The autfme descr&a a homogeneous (tuoreeceftce potarizatfon assayto 
measum pru ie otyt ic cleavage of ttte peptide substrata for a protease. The 
peptide s ub s ume was dasintlzed by oiotinytation of tha amino terminus artd 
fnlfin faTf with a ftuarsscein derivathm at tlia earboiy termirma.. tncubotian of 
this substrate with protease and mitaanQ iier tt addition of «v{<Sn produced a 
polarization dgnat tfial waa prop or t ional to tha relatfva amounts of claaved 



402 Combinstoriol chemistry 



h *o miitur* ""™' P™»«>e» "eBW «ba<wp«iv» eompouiKto 

' * teJiuS^l/^S^n^^TS^^ .PPOeatton to 

MsSTol^B^y^awT' «««-*^B»- '^oe So" 

Dhundalo ^ Qocfdard C; Reporter asftm Ifi hi^K. 

^ Brw«efn t Duel 
2™JScenoo-ba»ed reporter aene assar for luciferaM «fiH 
P-OatectesidaseL SA»fBchn»uo» lOoaaSksaSji 



20. 



2t. 



32. 



23. 



31. Gonzaia FA, Seth A. Raden Dl_ BoMmif. no c» e*> ^ 
SarwnHndueed ^rm^Sonof^^^J^^^I?' 

receptor ec^So? J^^ll?^ S?*^" ^ protelft^eoupiod 

to obtain a m^'^ST^tf??^ k^'S^^^* «niugale 
god receptor (6PcS^SSSDS^f^^p»S!ij^ G proteii:Soo- 
»~aocrQPCR-omMiinfa«MS<^ receptor 
that the f^^^^^li^S^SSi^t^'^^'^ micro^Spy. 
sponao to acdvathm fO^rTm^l^^HS^^ ^aama membrane in ro- 

SrKiS?ttehISi2SS^ 

y a«ma/ Sow lOoTSl 51 "^^^"^^^otfe compourKla. 

34. 



2a 



20. 



1994, 17:172-177. ®^ Pioducta. afotocAnnrm 

&on«etri K Maitin C, Fortin J. Oleaen Q Voyta i: 
Olw^Iumineacence: senscttvo detecdoit technotooy for 
reporter «ene emaaya. Cin Ca«„ ,996, 4^1642-^5^ 

t»»ol09y Bnd bldtechftoloai^ fiai ^SSTl 997, 

wi^M 'rJSS!'^ JL^*'** N. Feng U 

IfaT-oTL^lLS?^^ *f ^* QuantitBtlon of tranacriptionmd 
cto™J^«foctloo of ajmlo nving cefla wHh bSnS-rS^ 

from green to £t» b^dSeSfS^^r^^ ^^^^ «^ 

•oven tranamembrSe >«»rtS^S?J^ and any of a number of 

dent agoniaTmeSSS SSSSIL^^2fl * '«^«L«o~«'rtratlorvdopef». 
aDnroSi^ovfrfllrL??- • * reapcnse. The euihora auggmt that tWa 

mechanSm Jr^Tbo «SSI«v Phy«»otogical «gna) tranaducdon 

Taion R: Ruoreicent fiulleatora for bSSi, o,S-:' 

nucioua and endoplaamic reticutum at. amgle' HaLa cafla tmrmfM^Iii^^ 
2^«.tary DNAs encoding chimaer,S^SLSS".^^^ 



30. 



L«»ormand P. Sardaf C. Pages Q, L'AJIamain G, Bnmet A. 

^3!^*?S^,SSr ^^"^ nbroblaat^ y Ce0 



sa 

36. 
37. 

3a 

39. 
40. 

41. 
42. 



•dmnergte receptor end Qaa aubunit Soence 199a W):12|. 

ShTf^^^SSS^^SJ^^ BA. Pausch 

e« 1 998~ SlSMloST^ ~Pon.o pathwe^^ A/o/ Ceif 

ehamtel In yeaat Mo/ c27^5'S??2^?1?4a^^^ '^^ 

2?,SS£.2^J!S^^*^'i^!^^ interaction of llganda 
ana receptwa of tho hmatopolette swperfamUv fn ^2^^^/ 
&*doennol 1 09S. 9:1 321 -1 329. •***^*m«y tn yeaat Vo/ 

Schena M, Yamameto KR: MammaliAri rrttti-nmrti. „t i 
92:470? -470a^ Nati Acad So USA 1995, 



poundBS^SSoMSr^SSiSfrS^ ""^^Mda. in which com- 
wreyoo aaaay format for detecttna amau — t ■ zzl V ^ 

TK- ,'^*"™?**~ «^ cyiwm aSiggTT-B&S^^"**^" 

The authoia deacribe a miniahuizad ^a!*^mZTv!^Z^ ' . 

ing of Cganda ttSSed bTSS?^ ^5^22? toehnique for die ecroen- 

plaattc devTcea prepared uaina a ea^^^tC^^ airayed on 
mer molding. u£«r SntSSS^^S^^ Po^ 

<fimenatena of a eta^dloSn 

Tito mjttioTO dembe a novel, mWaturiied enzyme aaaav format n«rt«r«^ 
wrthin a mtcrofabricated channal n«fM«fr ^ . ^vr^ 7. 'ormai> performed 

fow B- g i rftniidw , meloeulea. Owjg Omm Tba»r 1Q07. 3:306. 



I 

I 



Naw assay technologic for hlsh-thraushptit screening Silverman el aJL 403 



i; 



48. 

47. 
48. 
40. 

9a 



Edsr J: Tumor necroslB factor a and iRtertouScIn 1 slgnaSno: 
do MAPKK kinases connect It alJ7 Tnn^ Fhtmacof 1907. 
320:319-322. 

I-Cn CS, Treisman R: TranscriptionaJ regulation by extracoQular 
sf gnats: mechanfsm and specifldty. Ca^T 1008. 80:100>21 1. 

^emack BA. SchaO 13: ChemoUna receptors: gateway to 
Inflammation end Infectiof^ Naturo Med 1 008. 11:11 74*1 1 78. 

Schmze S, Wdgmann K. MachMdt T, Kronka M: TNFHnduced 
acthratlon of NF4cBl immunobioSogy 1005, 103:103-203. 

Crmg O. Arriaga E. BanJta P. Zhang Y. Renborg A. Paldc M, . 
Dovichi N: Fluorescenco-based enzymatic assay by caplUary 



51. 



32. 



53. 



etoctrephoresis laser^ducedl Xna/ Bfbc/iem 1008, 226:147- 
153. 

AflderBon M. Tpoa t. Lorincz M, Parfca O. Herrenberg L. Nolan G, 
Httfzenbera L: Simultaneous fluarescence-ectivated cell sortef 
analysis of two distinct transcrfptlonal elements wfthin a single 
ceo using engineered green fluorescent pratein& Ptoc Natl 
AcBd Set USA 1 008, 93:8508-851 1 . 

Hebn R, Tsian R: Engineering green fluorescent protein for 
Unproved brlghtnesa.' Icmgar wavelengths and fluorescence 
resonance energy transfer. Curr Bioi 1008. 6:1 78-1 82. 

Burbaum U Ohbneyar MH Raadar JC, Handefaon I. DiBard LW, 
Randta TL. Sigal NK Chelsky O, Baldwin JJ: A paradigm for drug 
Recovery employing encoded combTnatorfat [Ibrartes. ^ 
Nsti Acad Sd USA 1095. &2.-6027-8031 . 



'S: 



IV 

♦ 



♦ 

1. 



i 

* 

I 

4 



t 



• 

4 



4 

i 



I 





Exhibit 33 



384 



High-throughput screening: advances in assay technologies 

G Sitta Sittampalam**, Steven D Kahl*# and William P Janzent 



BoFth isotopic and nonisotopic assay methodologies are 
employed in highfthroughput screening for drug discovery. 
Recent advances in ceD-based and in vitro biochemicai 
assays will be reviewed, wHh epeciaJ emphasis on detection 
technologies amenable to automated *mix and read' 
procedures in high-throughput screenirtg, A major trend is 
the advent of homogertous assay systems which employ 
fluorescence resonance energy trartsfer, fluorescence 
polarization, es%d fluorescence correlation spectroscopy. 
CeO-based assay systems have also become popular 
in high-throughput screens in which active compourtds 
that directly modulate the disease target are identified. 
Colorimetric and amperometric methods have also been 
described recently, but are yet to be adapted widely in 
htgh-throughput screens. 

Addr^ftMs 

*RasBarch Tachnotogies ar>d ^vtains, LiUy Resaarch Laboratorioa, Eli 

tJOf and Company, tndiaruipoUs, Indiana 46386. USA 

tSphinx Ptuumacouticala, A Division of Eli LilV and Company, 

461 5 University Drive. Durham, North Carolina 37707. USA; 

o^mail: janzeniSliUy.com 

l»fnail: aitta@liUy.com 

'e-mail: 8kahl<9liIly.com 

Currant Opinion In CHemlcal Biology 1997. 1:384-391 
httpV/b^omednet.com/elecrQf/ 1 3676931 001 00384 
e Currant Biology Ud ISSN 1 367-3931 
Abbravlations 



OABCYL 4 *(4'-dimethyl'aminob«nzinaazo) benzoic acid 

EOANS &'(a'-aminoethy1)Bminonaphthalene euHonic acid 

FCS fluoreacanca correlatron spectraseopy 

RJPR fluoreacence imaging plate r^tiv 

FPA fluoreacer>ce polarizatton assay 

FRET fluoreacertce reaonance energy tranafer 

HTRF homogeneous time-resolved fluoreacence 

HTS htgh-throughput screening 

RET resonance energy transfer 

SPA scintillation proximity assay 

WQA wheat germ agglutintn 



Introduction 

The discovery of phamnaceutical agencs with novel 
structures and potencial therapeutic accivicy is a complex 
process. U usually begins with intensive studies of 
the physiological and clinical man ife sea dons of diseases, 
followed by the idenciBcauon of* relevant genes and/or 
associated biological targets for therapy. Recent advances 
in molecular biology and DNA sequencing techniques 
have made tremendous progress toward sequencing large 
genomes [1]. I( is anticipated that the sequencing of the 
entire human genome, which consists of —3000 megabases 
(over 100,000 genes), will be compleced in the early part 
of the next ccncury. Hence the identificauon of genes that 
determine the expression of biological targets associated 



with human disease is rapidly advancing, opening new 
and exciting opportunictcs for the discovery of life-saving 
drugs. 

Coupled with these advances are developments in com- 
binatorial chemistry, where large and structurally diverse 
chemical libraries are being generated at an unprece- 
dented rate using parallel synthesis {Z\. Innovations in 
powerful computers, automation and software technology 
have provided an ideal environment to test hundreds of 
thousands of compounds for biological activity, identifying 
active molecules or *hits* that can rapidly develop into 
potential drugs or *leads* with desired therapeutic activity. 

High-chroughput screening (HT^> is the process of testing 
a large number of diverse chemical structures against 
disease targets to identify 'hits*. Excellent introductions 
and reviews on high-throughput screening (HTS) have 
been published recently (3**,4*,S,6**1. Briefly, current 
scate-of-thc-arc HTS operations are highly automated and 
computerized to handle sample preparation, assay proce- 
dures and the subsequent processing of large volumes of 
data. Each one of these steps requires careful optimization 
to operate efficiently and screen 100—300,000 compounds 
in a month period. Hence a modem HTS operation 
is a multidisciplinary field involving analytical chemistry, 
biology, biochemistry, synthetic chemistry, molecular biol- 
ogy, automation engineering and computer science [5}. 

Central to the HTS process is an in xntrx> biochemical 
or cctl-based assay using a validated biological target 
representing a disease state. In this paper, we will focus on 
current assay technologies that are employed in HTS, with 
emphasis on their advantages and disadvantages. Develop- 
ing detection technologies with potential applicability to 
HTS will also be briefly reviewed. 

HTS Instrumentation and capabilities 

In genera], the instrumentation used in HTS assays should 
be accurate, reliable and easily amenable to automation. 
Analytical methods should be robust and reproducible, 
with stable reagents and signal responses. Signal-to-noisc 
(S/N) ratios should be large enough to generate signal 
windows [7*] that allow reliable detection of 'hits'. Equally 
important are assays with *mix and measure* protocols, 
which are easier to automate than analytical methods with 
complex separation steps such as centrifugation, washing 
and filtration. This is particularly true as the industry 
moves toward ultra- HTS assays which will screen over 
100,000 compounds per day (8]. Another advantage of 
*mix and measure* assays is that binding measurements 
are made under equilibrium conditions (without washing, 
filtration etc.), and arc therefore useful for investigating 
low affinity interactions |9). 



Ht9h-throua)^put scTMnlng Sittampalam. Kahl and ianzon 38S 



Standard HTS assays arc currcnciy run in 96-wcll micro- 
ciccr places in batch formacs, since automadon and detec- 
tion instruments have been designed to be compatible 
with these plates. Combinacorial chemical synthesis can 
also be carried out in 96-well plates, making these plates a 
standard platform in nearly all HTS operations. Although 
assays in plates with 384 wells and (as well as 864- and 
1536-wclls which use the same plate dimensions) are being 
tested, assay formats based on these high density plate 
formats have yet to be widely implemented. 

Common therapeutic targets for HTS arc enzymes, cell 
surface receptors, nuclear receptors, ion channels, and 
signal transduction proteins (3**]. Compounds that interact 
with these targets are usually identified using in vitro 
biochemical assays; however, cell-based assays using en- 
gineered mammalian cell lines are now widely employed 
in HTS. This is because the ligand interaction occurs in 
the biological environment of the target, which provides 
opportunities to simultaneously monitor secondary cellular 
events such . as cytosolic Ca^* mobilization and other 
G-protein-coupled signaling. In addition, the target need 
not be purified extensively in order to be compatible 
with the in vitrp screening conditions. Cell-based assays 
also screen simultaneously for the bioavailability of test 
compounds * when intracellular targets such as nuclear 
receptors are involved. A major disadvantage, however, 
is the cost and difficulty of producing stable, engineered 
eukaryotic cell lines. Special techniques, instrumentation, 
and reagents compatible with cell-based assays have to 
be developed. Once in place, however, HTS laboratories 
are able to employ cell-based screens routinely. Detection 

Rguro I 



technologies available for both types of assays will be 
reviewed below. 

Detection technolc»gles 

Radiochemical methods 

Detection technologies employed in high-throughput 
screens depend on the type of biochemical pathway being 
investigated. For example, in xntro receptor binding assays 
with A'd values in the nanomolar to picomolar (nM— pM) 
range generally employ radiometric detection. The same 
is true for protein-protein interaction assays with 
values in the micromolar to nanomolar (^M— nM) range. 
Enzymatic assays, on the other hand, routinely employ 
colorimetric, fluorimetric and radiometric detection. 

Although filtration-based receptor binding assays have 
been used extensively in the past (to separate the 
bound and free radiolabeled ligand), the scintillation 
proximity assay <SPA) has become the standard assay 
in many HTS operations, mainly because it does not 
require a separation step, and can be easily automated 
(9.10.nM2M3,14,l5M6-21). SPA can also be easily 
adapted to a variety of enzyme assays (13, 14, 15*. 16] and 
protein— protein interaction assays [9,18,19]. 

One version of SPA utilizes polyvinyl toluene (PVnT) mi- 
crospheres or beads (— Spm diameter, density —1.05 g/cm-^) 
into which a scintillant has been incorporated (Fig- 
ure I; [8]). When a radiolabeled ligand is captured on the 
surface of the bead, the radioactive decay occurs in close 
proximity to the bead, and effectively transfers energy to 
the scintillant, which results in light emission. When the 




F^inciplas of scintUlaiion proximhy aaaay <SPA) t»chnology. (a) The path length of decay for the ^-particle released by the iaotope te not close 
enough to the SPA bead and the energy is dissipated in the aqueous fnedhim resulHng in littfe or no detectiorv Cb) When the radioliQand is 
bound to the SPA bead (through a specific capture motecule) the ^•particle released ia capable of excittng the scintillant contained within the 
bead and detectable light is emitted. 



386 Analytical tectmlques 



radiolabel is displaced or inhibited from binding co the 
bead, it remains free in solution and is coo distant from 
the scincillanc for efficient energy transfer. Energy from 
radioactive decay is dissipated into the solution, which 
results in no light emission from the beads. Hence the 
bound and free radiolabel can be detected without the 
physical separation required in Altration assays. 

The outer surface of the SPA bead is coated with a 
hydrophilic polyhydroxy film that reduces hydrop hob i city 
of the bead to reduce nonspecific interactions. This fiJm 
has been chemically derivatized to covalcntly couple 
generic-capture molecules. PVT beads with the following 
capture molecules are commercially available: Protein A, 
avidifv streptavidin, wheat germ agglutinin (WGA), glu- 
tathione, and sheep antimouse. donkey antirabbit and don- 
key antishecp antibodies. Ail of these capture molecules 
are used routinely as one member of a detection-pair 
system. These beads arc easily pipetted using automated 
liquid handling devices into 96-well plates and, therefore, 
are easily accommodated into HTS operations. 

The ideal isotopes for labeling ligands used in SPA 
assays arc and i^^I. This is because the 3 particles 
from have a relatively short pathlength. about 1.5 ^m, 
which easily fulfils the distance requirement for SPA. The 
Auger electrons emitted by '^^I* which travel between 
approximately 1 p.m and 17.6|im in aqueous media, also 
satisfy this distance requirement. Other commonly used 
isotopes in biology C^C, ^^S, 33P) emit particles 

with longer pathlcngths and are not suitable for SPA 
beads, since their decay is detected by the scintillant, 
even when the ligand is not bound to the surface of 
the bead (this is called the nonproximity effect). An 
SPA using 33P-labeled substrate for the cytomegaloviriis 
protease has been reported, however (IS^J. The decay 
pathlength for this isotope is — t26pjn, and it is not 
clear how the nonproximity effect was avoided in this 
case. In a similar screen using ^P-labeled peptide for 
calcineurin phosphatase activity, the nonproximity effect 
was successfully minimized by a simple centrifugation 
of assay plates (16). Other enzyme assays for copoiso- 
merase I [13] and ^-acetylgalactosaminyltransferase (14] 
utilized ^H-labeled substrates. The advantage of using 
is that the signals can be quite small, and disposal 
requires special precaution due to its long half*Ufe. Other 
recent applications of SPA beads include a coxicokinctic 
study of andscnsc oligonucleotides in plasma |17) and 
a kinetic analysis of inositol triphosphate binding to its 
receptor (20]. It appears that the use of SPA technology 
may rapidly expand beyond HTS into other areas of 
drug discovery and development such as genomics, cell, 
metabolism and toxicology. 

SPA can also be carried out in scintillating microplates 
(9,21,22*1, in which the scintillant is directly incorporated 
into the plastic, or is coated on the inner surface of 
the wells. These plates are available from two sources. 



Flashplate® is from NENTM Life Science Products 
(Boston, MA) in which the scintillant is coated on the inner 
surface of the wells. The Scinitst rip® plate is from Wallac- 
Oy (Turku, Finland) which is made by incorporating the 
scintillant into the entire plastic. With appropriate washing 
(not a 'mix and measure* technique) these plates offer the 
advantage of eliminating nonproximity effects. In addition, 
these plates arc available without licensing fees (required 
for the bead technology). One example of this is a 
protein— peptide interaction screen in which the binding of 
a 13 amino acid phosphopeptidc fragment of the epidermal 
growth factor (EGF) receptor to the GRB2-SH2 binding 
domain was investigated using the Scintistrip® plates [9]. 
The screen consisted of adding compounds to be tested 
and the I25i.|a|>eled phosphopeptidc, respectively, to 
a plate pre-coated with GRB2-SH2 binding domain, 
followed by a one hour incubation at room temperature. 
It was, however, necessary to remove all liquid from the 
wells followed by air-drying the plates before counting. 
This removal is essential to minimize nonproximity effects 
which contribute to background noise. An additional 
advantage of these plates is that they are compatible with 
other isotopes such as 3SS, 33p^ and *^2p. 

A more recent development is the Cytostar-TI*^' (Amcr- 
sham Life Sciences, Cardiff, Wales) scintillating mi- 
croplates (21) which were specially designed for cell-based 
proximity assays. Scintillant is incorporated into the base 
plate of microtiter plates and can also detect additional 
isotopes such as '^C, ^^Ca, *5S, 33p These plates have 
been successfully used to monitor ^^-labclcd thymidine 
uptake by cultured cells, and to measure ^^C^^* flux 
through ionotropic glutamatc-gated ion channels. The 
Cytostar-TTM plates were also used to detect mRNA 
transcripts in a high volume in situ hybridization [22*]. This 
is an interesting example of how HTS assay concepts are 
being applied to gene expression and target identification 
studies. 



Non-lsotople detection methods 

Coforimetry and luminescence 

Coiorimctric and luminescence detection methods have 
significant advantages for HTS laboratories, particularly 
in light of the cost, safety and disposal issues associated 
with radiochemical methods. HTS operations require 
relatively large amounts of reagents during scale-up, 
operations and follow-up phases. Radiolabeled reagents 
are expensive, and the scientists running radioactive 
screens should be . adequately trained and monitored. 
Since luminescence methods can be as sensitive as 
radioactive methods, with low detection limits, these 
techniques are being used increasingly in HTS as- 
says {23.24-,25-29,30-,3l-34 ,35» 36,37,38»,39*.40--*2,43»*. 
44-51). Glazer [24*) and Czamik ((25) and the Fluo- 
rescent Chemose nsors and Biosensors Database on the 
World Wide Web URU: http://biomednet.com/fluoro/) 
have reviewed the utility and need for fluoresce nee -based 



H Itf h-t hi uuqhput scrMntng Sittampatam, KaH end lanzen 387 




j^^^^^^^^^^^___„^^_^^^__^^^_^^^^^^^^^ CUwBni Opinion In Cftsnicfll Di ofcipy 

Miciplos of fluoresoenca resonance erwrgy transfer. The trenafer b invarsaly proportional to the sixth power of the distanca between the 
donof-a cca^o r patr, artd occurs only when they aro in close proximity via bintling to the sarna Itgartd. tntnctions ol the labels with tha medium 
and nonspacifks fluorescertce from the medium itself ar«d the spectral overiap of the donor emission and acceptor abaorptton can atgniftcantly 
affect tha maasured aignaL Hence the aeiection of tha dorwr-accaptor pair ia critical to tha succesa of ertergy trarwfer expertmenta. Acceptors 
with long ffajorescant lifetimes (rvncrosecorKls) allow time-reaolvad meaaurement of the Ruorescance emiasiorv Time-reaohfed measurements 
stgnificaritty enhance the signal-to-notsa ratios, ainca the ftuoreacarwa lifetimes of impuritiBs aro gwiOfaUy in the nanosecond limo acate. 



techniques for btologicaJ applications, which can be easily 
extended to H'FS assays. 

Resonance energy transfer 

Resonance energy transfer (RETi Figure 2) becween a 
fluorophore and chromophore was one of the earliest 
methods developed for HTS. A pepude substrate for an 
HIV protease was synthesized with EDANS (at the amino 
terminus) as the donor fluorophore, and DABCYL (at 
the carboxyl terminus) as the acceptor chromophore C26|. 
Energy transfer from EDANS to DABCYL in the intact 
peptide resulted in quenching of EDANS fluorescence. 
On cleavage by HIV protease, the fluorescence of the 
cleaved tctrapepttdc-EDANS was restored to the free 
fluorophore level. Using this assay, inhibitors of HIV 
protease activity were identifled using a simple *mix 
and measure* assay format [26]. Although a 40-fold 
enhancement of the fluorescence signal could be obtained 
in this assay, there arc several disadvantages to the 
DABCYL— EDANS pair. Many organic and narural product 
compounds absorb around the absorption and emission 
maxiina of EDANS (Xab-340nm, Vc^.490nm>. These 
organic and natural product compounds can also quench 
the EDANS fluorescence, generating false positives. Any 
trace contamination of the peptide substrate with free 
EDANS would result in a high fluorescence background. 

Tlme-reaoived ffuorsscertce 

A new homogeneous time-resolved fluorescence (HXRF) 
technology has been described 127).' The assay utilizes 
fluorescence energy transfer between two fluorophores 
(a europium cryptatc and a 105 k Da phycobiliprotein. 



allophycocyanin) as labels. The Eu-trisbi pyridine cryptate 
(TBP-EU^*, Xex"^^7nm) has two bipyridyl groups chat 
harvest light and channel it to the caged Eu-^***. It has a 
long fluorescence, lifetime and nonradiatively transfers the 
energy to allophycocyanin when the two labels are in close 
proximity (>50% transfer cfflciency at a donor— acceptor 
distance of 9.5nm). The resulting fluorescef>ce of allophy- 
cocyanin (X^,n*665nm> retains the long lifetime of the 
donor TBP-EU^*, allowing cimc-rcsolved measurement. 
Both these labels and their spectroscopic characteristics 
are very stable in biological media. Several homogeneous 
in .vitro biochemical assays based on these two labels 
have been described (Z7]: binding of epidermal growth 
factor (EOF) to its receptor, a Jun/Fos protein— protein 
interaction and as well as a tyrosine kinase assay. Using 
this concept, the flrsc HTS'assay for a protease enzyme 
(herpes simplex virus cype-l> was recently described by 
Kolb €r ai, (28]. 

Ce/hbaaed ffuoresconce assays 

The above methodologies are not easily adapted to 
cell-based assays. An interesting fluorescence resonance 
energy transfer (FRET) procedure for sensing voltage 
across celt membranes has been described recently, 
however (29]. The technique uses membrane permeable, 
anionic, oxonols which rapidly locate on the inner or 
outer membrane surface depending on polarization state of 
the membrane. FRET occurs becween fluorescein-labeled 
WCA and the oxonols bound to the outer surface of the 
membrane at a resting negative potential. Ac a positive 
potential, the oxonols arc relocated to the inner membrane 
surface, and the FRET is greatly reduced. 



388 AnaJytlcBl tachnkiuoa 



Many fluorescence intcnsiry ' measurements, including 
FRET, can be easily conBgured on a new instrument 
spcaHcally designed for cell-based HTS assays in 96-wcl) 
plates called FLIPR [30*|. FLIPR utilizes a water-cooled 
argon ion laser (5 watt) or a xenon arc lamp and a 
scmiconfocal optical system with a charge-coupled device 
(CCD) camera to illuminate and image the entire plate. 
Xhe spatial resolution of the optics is -ZOOpm at the cell 
plane. The plate chamber temperature can be controlled 
precisely, and a 96-well pipcttor head is integrated into the 
instrument. Xhese features allow accurate measurements 
of cellular biochemistry in confluent layers of cells 
at the bottom of plates. FLIPR software can rapidly 
quantify transient fluorescence signals in intact cells that 
are growing attached to the bottom of the well. HTS 
assays involving intracellular calcium, pH and membrane 
potential measurements have been designed using this 
instrument (3l|. 

Fluorescence polarizBtion 

Another technique that has gained popularity recently is 
fluorescence polarization or anisotiopy [32— 34,35*,36,37,38*]. 
When fluorescently labeled molecules in solution are 
illuminated with plane -polarized light, the emitted fluo- 
rescence will be in the same plane provided the molecules 
remain stationary. Since all molecules tumble as a result of 
collisional motion, depolarization of fluorescence emission 
occurs. This polarization phenomenon is proportional to 
the rotational relaxation dme i\L} of the molecule, which 
is deflned by the expression 3t)V/RT. At constant viscosity 
and temperature (T) of the solution, polarization is 
directly proportional to the molecular volume (V) (R is 
the universal gas constant). Hence changes in molecular 
volume or molecular weight due to binding interactions 
can be detected as a change in polarization. For example, 
the binding of a fluorescently labeled ligand to its 
receptor will result in significant changes in measured 
fluorescence polarization values for the ligand. Once again, 
the measurements can be made in a *mix and measure* 
mode without physical separation of the bound and free 
li^nds. The polarization measurements are relatively 
insensitive to fluctuations in fluorescence intensity when 
working in solutions with moderate optical intensity. 

A fluorescence polarization assay (FPA) for the cy- 
* tomcgalovirus protease using a peptide substrate labeled 
with biotin and S*(4,6-dichlorotriazinyl>aminofluorescein 
was reported recently (35*). This assay is similar to the 
SPA assay reported earlier (15*], except that the capture 
reagent is avidin, and it is added td the enzyme substrate 
mixture. High polarization values were observed- when 
the enzyme was inhibited and the uncleaved substrate 
became complexed with avidin. Another HTS utilizing 
an FPA involved the interaction of fluorescein-labelcd 
peptides containing phosphorylated tyrosine with Src-SH2 
domains (38*|. In both cases, a 96-well plate reader 
(FPM-2, Jolley Consulting and Research; Round Lake 
Illinois, USA) was used for the HTS. Signal from the 



enrire plate is read in about three minutes, making 5&*100 
plates/day assays quite feasible in HT^ laboratories. 

Fluoreacenee correfation spectroscopy 

Fluorescence correlation spectroscopy (FCS) has been 
recently described for HTS applications (39*.40.41J. FCS 
measures time-dependent and spontaneous fluctuations in 
fluorescence intensities in very small volumes (nanoliters). 
These fluctuations usually result from Brownian motion 
associated with chemical reacrions, diffusion or the flow of 
fluorescently labeled molecules. The average fluctuation 
is proportional to the square root of N, where N is 
the average number of molecules in the volume. Since 
Brownian diffusion is directly affected by molecular 
interactions, FCS is an excellent tool to measure binding 
interactions (23). Using powerful lasers and autocorrelation 
techniques, sensitive measurements (at concentrations of 
"lO-^^M) can be made both in solution and an cellular 
compartments. Access to this technology is limited since 
this instrumentation for HTS is available only through 
collaborative agreements on a semiexclusive basis (39*). 

Cell-based assay systems for HTS have been thoroughly 
reviewed, with guidelines for selecring appropriate screen- 
ing systems t43**). Assay systems using mammalian and 
insect cells, as well as yeast and bacterial cells have 
been described. The most common method for detecting 
ligand interaction with drug targets expressed in cells 
is to employ a reporter gene (3*»,43**,45,46,49,50J. This 
involves splicing the transcriptional control elements of a 
target gene (a gene that controls the biological expression 
and function of a disease target) with a coding sequence 
of a reporter gene into a vector. This vector is then 
transfccted into a suitable cell line in order to construct 
a detection system that responds to modulation of the 
target. Common examples of reporter genes are enzymes 
such as chloramphenicol acetyl transferase (AT), alkaline 
phosphatase (AP), firefly and bacterial luciferases, and 
p-galactosidase. These enzymes can be detected at very 
low levels using colorimetric, chemiluminescent or biolu- 
minescent products of specific substrates. The chemistry 
of chemiluminescent and bioluminescent reactions have 
been reviewed in detail [46,47]. 

A new reporter system using the p-lactamase enzyme with 
a membrane permeable fluorogenic substrate has been 
cited for cell-based assays (3**). The advantage is that 
the enzyme is monomeric and has no endogenous activity 
in mammalian cells. Since fluorescent substrates arc not 
yet commercially available, this system is yet to be used 
widely in HTS applications. 

Future devolopments and conclusions 

Several new trends can be observed in the recent HTS 
literature ((52-56,57— ,5»-69). The use of 384-wcll plates 
in HTS is being investigated [52], which would increase 
throughput and reduce reagent cost. Statistical experimen- 
tal design tools are being explored to improve the ro- 



Hlett*throijghput seroentns Sittampalam, Kahl and lanzen 389 



business of assays [53]. New recombinant microorganisms 
are being studied to screen for non-ancibiotic compounds 
[54]. A sensitive col ori metric assay for in vifro molecular 
recognition using polymeric artificial membranes has been 
described (56,57**«58]. These membranes, which contain 
a ligand, can be polymerized into liposomes. These 
liposomes change their chromatic properties on binding 
to a solubilized target such as a receptor. Developments 
in scanning probe microscopy for screening and drug 
development (I59»60] are quite exciung because the 
molecular interaction could be detected without labeling 
the target or the ligand. 

New analytical devices are also being developed. A 
detection device based on an amperometric sensor chip 
(62] and an amperometric electrode probe. (63) has been 
described. The microariay technology chat has been 
developed for analyzing gene expression (65), and other 
analytical methods used in characterizing combinatorial 
libraries (66-69)« could be adapted for medium-throughput 
screening applications. 

The science of HTS is undergoing explosive growth due 
to rapid developments in assay technology. Major trends 
include the development of nonisocopic detection systems 
and the use of cell-based assays. Miniaturization of assay 
technologies coupled with aucomarion of high<throughput 
combinatorial synthesis is helping to set the stage for 
screening SO-1 00,000 samples/day in an ultra-HTS mode. 
Bioinformatics systems to collect, analyze, manipulate 
and store the massive amount of data are also being 
rapidly developed. When these capabilities are realized, 
the multitude of targets derived from the human genome 
effort can be screened, using large numbers of structurally 
diverse libraries to generate selective and potent lead 
compounds. It is also anricipated that the technologies 
developed will greatly contribute to efficient design 
of secondary and tertiary assays used to determine 
structure-activity relationships. The net effect would be 
the ready availability of multiple, high quality leads to 
develop novel therapies for the treatment and prcvcndon 
of disease. 



References and recommencSect reading 

Paper* ol particular intareat, pubttshed within tho anmMl period of 
have beart htghlighted u: 

* ol spocial intereat 

«« of outstanding intareat 

1 . OBver SG: From ONA eequartoe to bfolOQlcal fundfoik Nature 
1096. 37»:8e7-600. 

3. BAum R: Combinatorial chemlmtiry. Chem Eng Newa, February 
13. 1900:38-54. 

3. • Broach JR. Thomer J: Kl0h tfirc»u8hput aimeiihiu tar dmo 
— dlscowry. Ateture 1996. 384:14-16. 

Thta ia an excefloni owefwew ol h»gh-4hrcuig;Kput screening (HTS) w wt h ita 
edvanteoee and dcasdwantagee in early dnjg discovery. A good account ol 
problama encountered with in Were and eeO-based aaaaye ore also s^n. 
Future developrrtenta cn HTS are oonciaely documented. 



4. Janxen WP: Htah throughput screentrtg m • discovery tool (n 

• the phermacautleal Industry. Lab RoboticM Automation 199B, 

a.-3ei-a65. 

An industrial perspective on steps leading up to the implementation of high* 
throughput screening is given in this paper. Topics covered iftdude organi- 
zational tnteractioris. the scre ening paxacfigni employed, and how assays are 
converted to screene. ^ 

6. Fernandas PB: Letter from tt>e society president J Biotnoi 
Scramntng 1997. 2:1. 

6. Burbaum JJ, Sigal NH: New tectirvologloe for hlgh-throwahput 
•« sereenino- Curr Opwi C/iem BioS 1997. 1:72-78. 

An excellent review of new lechnologiee for htgh-throughput screening. The 
paper contatna references on nonradioactive assay techf\olof^es, screening 
methods for camt»natorial fit ararie s and taaues sasoctatod with assay minia- 
tuhzatton. 

7. Sittampalam GS. hrareen PW. Boadt iA. KaM SO, Brighl S. 

• Zock JM, Jancen WP. Uatsr MO: Desisn of signal windows In 
high throughput screertino assays for drug discov er y. J Biomot 
Scnaning 1997. 2:169-169. 

This paper descrtbea the concept erf ai^inal windows, wttich provides a de- 
gree of separation between meaaursd s^gnala. The size ol the signal window 
is a cntieal performance parameter (in high-throughput screens) which im- 
pacts the idemification of active cxnnpounds (^ta*) in the presence variabil- 
ity. 

8. Hook O: Ultra high ttwoughput scrvantng * a Joumoy Into 
Nanetsnd with QuUlvar and AUea. Drug Dmcov Tech 1 896. 
1 :267-26e. 

9. Braunwalder AF. Wennogle L, Gay B. Upson KE. Silla MA: 
Application of scintillation microti ter pistes to measura 
phoapftopapUde IntaracUona with GR82-SH2 binding domain. 
J Biomot Screening 1 996. 1 :a3-36. 

10. Cole JL: Approacftas to high volunrva screening assays of vtrsl 
poJymarases and related pretalns^ Methods Enxymot 1996. 
275:310-328. 

11. Cook NO; Selntf nation prostmitv aasay: a varsatlta high 

• throughput scraaninq tec hn ology. Dnig Discos Tech 1996. 
1:287-294. 

A good account of the baaic principlea of the scintillation proximity assay and 
its specific appfications in high-throughput acreentng. with 70 references 
cited. 

1 2. Kahl SD, Hutibard FR. Sittampalam OS, Zock JM: Validation 

• of a high throughput sdntiltattofi proximity assay for 5- 
hydroaytryptamln«ie recaptor IHndIng actJvftv^ J Biomot 
ScreeninQ 1997. 2:33-40. 

This paper descnt>as the development of a scintiOattcm proximity assay for 
receptor tttncfing studiaa in high>throughput mode. VaHdalion concepts and 
factora effecting rotnjstness of the ossoya in high 'throughput ecrsening 
mode are addrrased in detaiL 

1 3. Lamer CG, Chiang Saiki AY. Madonnon CA, Xuei X: 
High throughput acieen for Inhibitors of bacterial ONA 
topolaom ara se I using sdntfltatlen proximity essay. J Biomot 
Screening 1996. 1:139-143. 

14. Baker CA. Peorman RA. Kazdy FJ. Staples OJ. Smith CW. 
Bhammer AP: A aclntatatlon proximity essay for UDP- 
OatNAeqpelypaptlda. N-Acatt^oalactosamlnvftranaforats*. Ana/ 
Biochem 1996. 239:20-24. 

1 5. Baum K Johnston SH. Berbemitz GA, Gluxmsn Y: Dovalopment 

• of adntUlatlon proximily assay fdr human cytomegalovirus 
prcktaaso using ^^ptioaphorua. Ana/ Biochem 1996. 237:139- 
134. 

This paper damonstrstea the usa of as a label in a scintBtation pr o z im - • 
ity assay system, Ths euthors demonstrste the utility of a simple 'rnii and 
maasurs* assay to scre en for protease tnhtbitoni^ 

1 6. Sullivan E. Hamsley P. Pfckard A: Development of a sdntlllatton 
proximity aasay for calclnauftn phoapttatasa a cti vi t y. J Biomot 
Screening 1997, 2:19-23. 

17. Oe Ssn-es M. McNulty MJ. Omstertsen U Zon G. Ftndlay JWA: 
Devalopmant of a novel adntlllatton proxtmlty competitive 
hybrtdlzatton assay for the detamttnatton of ptiospfiorothloata 
antlsansa ollgonuclootlde ptaame conc en tr a tions In a 
tozlcoktnetic study. AnaJ Biochem 1996. 233:228-233. 

18. Sonatora LM. Wianiewskj O, f=rank LJ. Cameron PM. Hermea JO. 
Marey At. Ssiowe SP: TTta utiUty of FKao6-blndb»g protein as a 
fusion partriar In actntHlatkm proximity assays: appUcatton to 
SH2 domains. Anat Biochem 1996. 240:289-297. 



300 AiiMtytte-al tacMctuas 



19. 



30. 



77. 



Chan T. Repetto B, ChizzonHe R, PxOlv C. Burghardl C, Oharm 
E. Zhmo Z, Carroll R Nunea Basu M or nil tntoracdon of 
phoa|)t>ofytatMl Foltfr ImmunootobuOn rp ce pt or tyrosine 
•cttwtton mottf-basod poptlctos with dual and stnota SH2 
domains of pT2^^. J Biol C/»em 1996, 271 :3S308'253tS. 

Palsl S. Hams A. O'Betmo Coofc NO, Tayf^ CW: Ktnattc 
Anafysts of Inositol triphosphate binding to pure inositol 
triphosphata raeaptors ustno sctntUlation praitmlty assay. 
BioehBm Biophym Rem Commun 1998, 231 :83 1-635. 



36. 



Fa* S: Hofoldlng a m 
1988, 6:1-3. 



era of cell-based assays. Phaim Fomm 



Herrie OW. Kenrick MK. PHher RJ. Anaon JO. Jones OA: 
• Davatopment of a Mah voluifto in mitu mRNA hybrtdlxatien 
assay for tho quantification of oana exprasslen utinalna 
sdnttltatlno micraplataa. Ana/ Biochnm 1996. 243:349-356. 
A unique apptication of ectntiDation pr oxim ity assay using sctntinating fn»> 
croplates lo quontriy mRNA in mitu. This met ho d detects mRNA transcripts 
ai the level of 10— 30 copiea/cell. and is 30-fold more sensitive than Northern 
blottino. The authors demonstrate the utility of a high throughput approach 
to quantify gene expression. 



Brown MP, Royer C: Fluoreseanca spoctrosoopy as a tool to 
investJoBta protein Intaraetlons. dm Opin Biotecftnat 1 997. 
6:46-49. 



33. 



34. Qlacer AN: Recent advartcas in fluoreseanca labatlng. datactlon 
• and vtsuaUxstlon. BioRBdiationm 1997, 98:4-6. 

A good review of the fundamentals of fluorescence tecfmiques for biologicaJ 
meaBurements. Detection te chniqu es described are easily amenable to high- 
thoroughpul acreening essays. A section on recent a dvance s <ie8crS>e devel- 
o pm ents in ftuorescertee resonance energy transfer (FRET) arul fluorascenco 
in situ hybridization (HSH) lechnotogies, me mbrane potential msasuremanta 
and Itaryotypirtg human chromosomes. Over 40 references are cited on var^ 
ious applications. 



36. 



36. 



37. 



38. 



29. 



30. 



Ciamai AW: 
3:433-436. 



Desperately aaefcing 



Chem Sw/ 1995. 



Wang GT. Matayoshi E. Huffafcer J& Krafft OA: Design and 
synttiosis of new fhjorogenic HIV protaaso substrates b a « ad 
on resoranca energy transfer. Totmhwfron Lett 1991, 31:6493- 
6496. . 

K4atfus G: Probing molacutar Intaraetlorta with h m nog a naoua 
techniques based on rare earth ci yp tata s and fluoreseanca 
energy transfer. CHn Otem 1996, 41:1391-1367. 

Kolb JM, Yamanaka Q, Manly SP: Use of a rtoval homoganooua 
nueraacent technotcigy in high throughput s cr e enin g. J BJomal 
Screenmg 1996. 1:303-310. 



Gorualex JE. Tsien RY: Voltago sensing by fhM>f 
resonarvca energy transfer. Biopf*y» J 1 995, 69:1 373-1 360. 



Schroeder KS, Naagte BD: FLtPR: A new Instrument for 
• Bccurata. high throughput optical screening. J Biomoi Screening 
1996. 1:75-60. 

This paper descritMSS an instrument that can simultaneously read fluores- 
cence sigrkals from csils in sH 96 wells of a mtcrottter plate. Kinetic updates 
can be obtained in less than one second in all wetta, and the equipment 
aflows for measurement of transisnt fluxes in intracelMsr Ca'*, pH and mem* 
bcane potential. 



31. 

32. ' 
33. 

34, 

35.* 



Waggoner A. Tsylar 1^ Saadler A, Ounlay T: Multiparameter 
fluoresce nca Iniagtng microscopy: radiants and instrumanta. 
Hum Rmthot 1996, 27:494-603. 



Jameson DM, Sswyer WH: Pluorescartca anlsotropy appllod to 
Wo mo l e cular interactions. Methods Enxymoi 1 995. 246:363-300. 

Lundblad JR. tauranco M. Goodman RH: Ruorescanca 
poiartxatlon anafysts of protaln-protaln Intaractions. Mat 

Endoerinot 1996. 10:607-61 3. 

* 

Checovich WJ, Bolger RE, Burtce T: Fhiorascanca polarlaatlon - a 
rtaw tool for cell and molecular btotogyi Marum 1995. 375:254- 
356. 



Levtne LM. Michener ML. Toth MV. HoKmrda BC: Maasuramant 
• of apadfte protaasa activity utiUxtng ftueraacanca p<»lBrtxatlon. 

Ana/ SfOChem 1997. 247:63-66. 
This is the first report descrit>ing the use of fluorescsnce polasixation in a 
high>throughput mode using a modified 96-wefl plate reader. The peptide 
substrate (or the human cytomegalovirus proteaae was labeled with bcotin 
and OTAF (6-4.e-d)Chlorotnann-3-yQamino)fhioreaceirO. The uncleaved aub- 
strate. «vt>en t>ound to ovidin. pro^«ced a high potancation value: hence, the 
presence of inhibttors in the mixture can be aaatly idantiftadL 



37. 



JoOey ME: Ftuoresoenco pot(»1astton assays for the detadion 
of proteases and their. I nttibi tors. J Biomoi Seroenina 1996. 
1 :33-3a. 

Schade SZ, JoOey ME, Sarauer GJ, Simortaon LG: BOOtPY-a- 
Caaalcv a pH-iji de pe ndent protein aubstrate for protaasa 
assays usirtg fhiorascanca poiarlzabofv Ana/ Biochom 1 906, 
243:1-7. 



36. 



Larnch BA, Lotacorto LA. Tiong CU Adams SE. MacNeil lA: 

• A ffuorescenee polerlzallon based Sre-SH3 binding tnaay. Anal 
Biochmm 1997. 247:77-62. 

Application of fluorescence polarization in a htgh-throughput mode to a 
proteif»-pe pt ide interaction, assay involving Src-^H3 domain. CartKuyftu- 
oroaeain without a Gnker was used as the label for the peptide probe to 
minimixe propefler effect. The aaaay tolerated up to 20Sb dirrtethyl sulfoxide 
(Olk^SO). a com m on eoiveni ussd to dasolve compounds tested in high- 
throughput screening. 

39. Stetmr S, Henco K: Ruoresconco correlation spectroscopy 

• (FCS> - A highly sansitlva ino tf i od to anatyza drug/target 
Interactions. J Rocopt Signal Tnnsduct Rom 1 997, 1 7:5 11 -530. 

This paper descrit>e8 the use of fluorescertce corretation spectroscopy to 
measure molecutar Interactians in a homog e neous mode. It has t>een d<rfel- 
oped to eccommodate measurements in 96' arxJ 384-well microtiter plales, 
and is a good candidate for high-throughput acreenfrtg applications. 

40. 



41, 



43. 



43. 



Rtgler R: Ruereacence corralatlofta, singlq mfHocwle detection 
and large nurrtber scr e a nir^ Appllcatlona In biotechnology;. 
J Bkxtechnol 1995, 41:1 77*1 86. 

ftauer B* Neumann E, Widangren J. Rigler R: Ruorescanca 
corralation apaetremelry of tt«e Interaction kinatica of 
tatr an rwthylf hod amtne a-bungarotoxin with Tbrpetfo cai/fomJca 
acetylchoHno receptor. Biophym Chem 1996. 68:3-13. 

Sanjbbi E. Yanofalty SO, Barrett RW, Denaro M: A call-free, 
nonisetople, hlgh^roughput assay for Inhibitors of typa-1 
lntadaukln-1 receptor. Anol Biochmm 1 996. 237:70-75. 



Roao P, Gormen J. Kurtz S. Patel P. Femandes P: The 
successful partnership of blotachnelogy baaed icreen 
davalopmant wltft high tfiroughput acroanirtg. Notv^oHt 
Soence 1096. 2(SepO:1-12. On the Worid Wide Web URL: 
http/Nvww.awod.corn/netsci/actoneeAScreentrtg/featuroob/html 
A com pr ehensive review of call-based assay ayate ms available for high- 
throughput screening applications. Targets reviewed include ion channels. 
G-protair»-eoupled receptors, tyrosine kinase receptors, tntraceOutar recsp- 
tora, protein-protein interactions arul proteaaea. Over 40 references are 
cited. 



44. 

46. 

46. 

47, 
48. 

49. 
50. 

61. 

63. 
S3. 

54. 



Dhundale A Goddard C: Reporter aasaya In high throughput 
scraerUrtg laboratory: A rapid and robust first look? J Biomo/ 
Screening 1996. 1:115-118. 

Suto CM. Ignar DM: Salacbon of an optimal reporter gene 
for call- based high throughput acroarting aaaays. J Biomo/ 
Scfoening 1997, 2:7-9. 

Bronst ein I. Fortin J. Stanley PE. Stewart GSAB, Kricka U: 
Chemllumlnaacant and blolumiiwacant reporter gene essays. 
Anal Bioehem 1994. 219:169-181. 

Hastings WJ; Chemistries and colors of biolumlnescant 
reacttona: a review. Gono 1 696, 1 73:6-1 1 . 

Lehel C. DanieHsaalcani S, Brasaaur M. Strulovici B: A 
chemllumlfreacant micr oplate assay for sensitive detection of 
protein kinase BCttvtty. Anal Bioehem 1 997. 244:340-346. 



Kolb AJ. Neunnann K: Luctfarase measurements In high 
ttwoughput screening. J Biomot Scfoaning 1998. 1 :8&-88. 

Bran MR, Messier T. Oonnan C. Lannigan D: CeD-t>ased assaya 
for a-protein-coupled/TVroalne kinase coupled recoplora. 

J Biomot Scrooning 1996. 1:43-45. 

Rizzuto R. Brtni M. De Giorgi P, Rossi R. Heim R, Tsien RY, Pouan 
T: Double Isballlng subcellular atructurea «vtth organalle- 
targeted GFP mutante In Wro. Curr Biol 1996. 6:183-168. 

na. 



Janzen B. Domanico P: The 364 wall plate: proa arKf 
J Biomol Seroaning 1966. 1:63-64. 

ljutz MW, Manius A. Choi TO. Laskody RG. Dorfwnco PL, 

Qoett AS, Saussy DL: Esperi n rt ante l ctesign for high-throughput 

screening On/g Diaeov Tech 1996, 1 .-277-386. 



tOaln RO, Geary GO: fteoembtnant microorganisms aa tools 
for high tftroughpul s cr e enin g for non antibietlc compounds. 
J Biomot Screening 1997, 2:41*49. 



HIgH-ttirouotipUt scroonins Sinampalam, Kahl and lanxan 30t 



55. Webb SA, Hurafcoxnen Pi Transcriptton-spedfle assay tor 
Ouantlfyfno mRNA: A potentlaJ reptocem er rt for reporter 0ene 
asaaya. J Biomol Scnening 1 096, 1 :1 1 0-1 ai . 

66. Charych DH. Nagy JO. Spevak W. Bednavalu MD: Dtrect 
eol ojlm a trtc detection of racaptormo«nd Interaction by m 
polymerized bltayer aaaemfaJy. Science 1003, 261 :Se8<6B8. 

67. Charych D, Ctteng Q, Retchart A. Kutiemke G, Stroh M. Nagy JO. 
Spevak W. Steverm RC: A *Utmu« tear tor molecular recoorUtlon 
uaino arttflcJat mem br ana a. C/iem BM 1006, 3:1 13-120. 

A unk|we, fnembrarwbassd colorunetric system thai detecta moleculBr k*- 
teractions is described. GarfgfiosidBS that speciftcaDy bbtd ch o te r a toadn. 
Escheriehm coU entaratoidn and botutism neurotoodn ware incorporatad into 
a palytfiacerylene membrane. The polymerized membrane containing gan- 
gfiosidea ta t>lue. and ttons red when the toxin te The resporuia ia 

sensitive, apectfic and eetectiva. An oxceHsnt tedmology for high-throughput 
screening applicattona. 

56. Spevak W, FoxaO C. Charych OK Dasgupta F, Nagy JO: 
Cart>6hydratea tn an ackfle mutthralont eaactmbty: rtarterratar 
f»-aelectin tnhlbltora. J A4od C/iem 1 096. 38:101 0-1 02a 

50. AOon S. Oavies MC, Roberts CJ. Tandlor SJB. WitKama PM: 

Atomtc fofco mlcroacopy In ertetytlcal blotechrtolooyk Trends 
Btotechno/ 1007, 18:101-105. 

60. Troy CT, Abrams SB: Scanning force mlaoscopy helps fn the 
design of caiwer druoa> Biophoton tnt 1 006. Sept/Oct:53-63. 

61. Paboraky LR. Dunn KE, Gibbs Ca Dougherty JP: A nickel chelate 
mierotlter plate essay far six histldtne contslnlng protalna. Anai 
Biocham 1 0OO. 234:60-65. 



62. Wetaa-Wichert CK Smetazko M. Valina-saba M. 
Schalkhanvner TH: A new enetytlcel device bMed on gated 
Ion channela: A peptide chanitel bloaeAaor. J Biomot Saeening 
1007, 2:11-18. 

63. Brecht A, Burckardt R, Rickert J. Stsnvnlei' t Schuetz A, 
FUcher S, Friedrtch T GaugStt G. Goepel W: Transducer-based 
approac h es tor peraOel binding essays In HTS. J Biomoi 
Screening 1006. 1:101-201. 

64. Tyagi S, Kramer FR: Molecular beacons: probes thst fluoresce 
upon hybrfdUetton. Atel Biotochnot 1O06, 14:303-306. 

65. IHeOer RA. Schena M, Chai A, Shakm D, Bedifion T. Oamoro J. 
WooOey DE, Davis RW: Discovery erKt enalysia of IrtflamiTWtory 
dlseese^tatod genes using cONA mJcroarrays. Aroc Notf Acad 
SciUSA 1007. 84.-2100-2166. 

66. Nicolaou KC, Xiao XY. Parandoosh Z. Senyei A, Nova lUP: 
Radlofrequency encoded comtrinotorial cfiemlstry. Angev^ 
Chem Int Ed 1005, 34:2380-2201. 

67. Fitzgerald MC, Harria K. Shevltn CGl, Scuzdak G: Direct 
cheracterlzatlon of solid phase resln-teund motecutes by 
maaa spectrometry. Bk^org Aded Chem Lett 1 006, 6:070-082. 

68. Chu YH Dunnayevsldy YM. Krtyy DP. Vouros P, Korger BL: AfHnlty 
caplDary electrophoresls-tnasa spectnsrvwtry for acreeiUng 
comblftetoriel llbrertea. J /Am Chem Soc 1006, 116:7827-7635. 

60. Evans DM, Wiffiaim KP, McGuinnees B. Terr Gi, Regnter F. 

Afeyan N, Jtndal S: Affinity based screening of combtnatorial 
IJbrertea usino eutomated, serfaf-column chromatography. Nat 
erbescAno/ 1 006, 14:504-507. 





Exhibit 34 



Proc. Natt. Acad. Set. USA 
Vol. 96, pp. 10984-10991, September 1999 
Colloquium Paper 





This paper was presented at the National Academy of Sciences colloquium ^'Proteolytic Processing and Physiological 
Regulation/' held February 20-21, 1999, at the Arnold and Mabel Beckman Center in Irvine, CA, 



The structure of the human )3II-tryptase tetramer: Fo(u)r better 
or worse 

Christian P. SoMMERHOFF*t, Wolfram Bode*, Pedro J. B. Pereira*, Milton T. Stubbs§, 
JoRG Sturzebecher^, Gerd P. Piechottka*, Gabriele Matschiner*, and Andreas Bergner* 

^Abteilung Klinische Chemie und Klinische Biochemie in der Chirurgischen Klinik und Poliklinik, Klinikum Innenstadt der Ludwig-Maximilians-Universitiit, 
Nu^baumstrasse 20, D-80336 Munich, Germany; tAbteilung fur Strukturforschung, Max-Planck-Instilut fiir Biochemie, Am Klopferspitz 18a, D-82152 Martinsried, 
Germany; ^Institut fur Pharmazeutische Chemie der Philipps-Universitat Marburg, Marbacher Weg 6, D-35032 Marburg, Germany; and 'Klinikum der Universiiat 
Jena, Zenirum fur Vaskulare Biologic und Medizin, Nordhauserstrasse 78, D-99089 Erfurt, Germany 



ABSTRACT Tryptases, the predominant serine protein- 
ases of human mast cells, have recently been implicated as 
mediators in the pathogenesis of allergic and inflammatory 
conditions, most notably asthma. Their distinguishing fea- 
tures, their activity as a heparin-stabilized tetramer and 
resistance to most proteinaceous inhibitors, are perfectly 
explained by the 3-A crystal structure of human 0II-tryptase 
in complex with 4-amidinophcnylpyruvic acid. The tetramer 
consists of four quasiequivalent monomers arranged in a flat 
frame-like structure. The active centers are directed toward a 
central pore whose narrow openings of approximately 40 A x 
15 A govern the interaction with macromolecular substrates 
and inhibitors. The tryptase monomer exhibits the overall fold 
of trypsin-like serine proteinases but differs considerably in 
the conformation of six surface loops arranged around the 
active site. These loops border and shape the active site cleft 
to a large extent and form all contacts with neighboring 
monomers via two distinct interfaces. The smaller of these 
interfaces, which is exclusively hydrophobic, can be stabilized 
by the binding of heparin chains to elongated patches of 
positively charged residues on adjacent monomers or, alter- 
natively, by high salt concentrations in vitro. On tetramer 
dissociation, the monomers are likely to undergo transforma- 
tion into a zymogen-like conformation that is favored and 
stabilized by intramonomer interactions. The structure thus 
provides an improved understanding of the unique properties 
of the biologically active tryptase tetramer in solution and will 
be an incentive for the rational design of mono- and multi- 
functional tryptase inhibitors. 



Human mast ceil tryptases (EC 3.4.21.59) comprise a family of 
trypsin-like serine proteinases closely related in sequence that 
are derived from >3 nonallelic genes (1, 2). Tryptases (at least 
isoenzymes al, /3I, j3II, and /3III) are highly and selectively 
expressed in mast cells and to a lesser extent in basophils (3, 
4). Only j8-tryptases, however, appear to be activated intra- 
cellularly and stored in secretory granules (5, 6), accumulating 
to much larger amounts than any other of the granule- 
associated serine proteinases of leukocytes and lymphocytes. 
On mast cell activation, /3-tryptases are secreted bound to 
heparin in diverse allergic and inflammatory coiiditions rang- 
ing from asthma and rhinitis to psoriasis and multiple sclerosis. 
Various studies performed in animals and humans have pro- 
vided considerable evidence that tryptases are directly in- 
volved in the pathogenesis of asthma (7-9), a hypothesis also 
supported by apparent genetic links of tryptases to airway 
reactivity (10, 11). 



PNAS is available online at www.pnas.org. 



Several unique properties distinguish tryptases from other 
trypsin-like proteinases (reviewed in refs. 12 and 13). Most 
notably, tryptases are enzymatically active in the form of a 
noncovalently linked tetramer. The tetramer is stabilized by 
association with negatively charged aminoglycans such as 
heparin or high ionic strength conditions in vitro. On dissoci- 
ation, reversible only under certain conditions, the monomers 
lose activity, apparently because of transition into a zymogen- 
like state (14, 15). This mechanism is thought to govern 
tryptase activity in vivo. With the exception of the ^'atypical" 
Kazal-type inhibitor leech-derived tryptase inhibitor (LDTI) 
(16, 17), human tryptases are resistant to inhibition by pro- 
teinaceous inhibitors- In accordance with their trypsin-like 
activity, tryptases efficiently hydrolyze a number of peptide 
substrates including the neuropeptides "vasoactive intestinal 
peptide" and "peptide histidine methionine" (18). Few macro- 
molecular substrates are cleaved, however, leading to the 
activation of prostromelysin, prourokinase, and the protein- 
ase-activated receptor-2 (19-21) and the inactivation of fi- 
bronectin and of the procoagulant functions of high molecular- 
mass kininogen and fibrinogen (22-24). 

These distinguishing features are well explained by the 
crystal structure of the human lung )3II-tryptase tetramer, 
whose overall architecture has been summarized recently (25), 
Here, we describe the identification of the tetramer within the 
crystal packing, the detailed structure of the monomers, and 
their interactions in the tetramer. In addition, structural 
features likely to favor a zymogen-like conformation of iso- 
lated monomers and models of the interaction with stabilizing 
heparin proteoglycans and inhibitors are presented. 

Identification of the Relevant Tryptase Tetramer. In theA;-^ 
plane of the tryptase crystals, the tryptase monomers are 
arranged in flat rectangular tetrameric aggregates that form 
extended protein layers (Fig. \a). Within these layers, each 
tetramer is rotated about the crystallographies- and 6-axes by 
^7°, in agreement with the self-rotation function. The tetra- 
mers appear well separated from their neighbors in one 
direction (x-direction in Fig. \a) but are in somewhat closer 
contact in the perpendicular direction (y in Fig. la). In the 
2-direction, the tetramers are stacked along the crystallo- 
graphic 4i screw axis. Because of the T tilt of each tetramer 
from the x-y plane, their projections (Fig. \b) alternate be- 
tween leaning to the left, being horizontal, and leaning to the 
right, respectively, giving rise to a 7** precession motion of the 



Abbreviations: APPA, 4-amidinophenylpyruvic acid; LDTI, leech- 
derived tryptase inhibitor. 

Data deposition: The atomic coordinates have been deposited in the 
Protein Data Bank, www.rcsb.org (PDB ID code lAOL). 
■♦■Towhom reprint requests should be addressed. E-mail: sommerhoff@ 
clinbio.med.uni-muenchen.de. 



10984 



Colloquium Paper: Sommerlf 




Proc, Ni 



ad. ScL USA 96 (1999) 10985 





Fig. 1 . Packing of the human /3II tryptase crystal, {a) View along the z-axis showing one layer of tryptase molecules in the x-y plane. The tryptase 
monomers are grouped into tetrameric aggregates that form extended sheets. Each of these tryptase tetramers is clearly delimited from its neighbors 
in both directions. A "reference" tetramer is shown in red for simplicity, (p) View across the z-axis. In the z direction, layers of tetramers are stacked 
on each other along the 4i screw axis. The local 2-fold symmetry axis is tilted from the z direction by causing increased crystal-stabilizing 
contacts between layers stacked in thez-direction. One unit cell (82.9 x 82.9 X 172.9A), occupied by four tryptase tetramers, is indicated by a white 
bordered box. 



local (2-fold; see below) rotation axis along the crystallo- 
graphic 4i screw axis. The largely complementary interaction 
surfaces between the monomers of the tetramer are typical for 
intersubunit contacts, whereas neighboring tetramers interact 
with one another via much more usual crystal contacts. Thus, 
within a tetramer, monomer A (Fig. 2) interacts with mono- 
mers B and D via interfaces of sizes 540 and 1,075 
respectively (solvent inaccessible surface probed by using a 
sphere of 1,4-A radius; Collaborative Computational Project 
No. 4 suite). In contrast, the four monomers of one given 
tetramer interact with monomers from neighboring tetramers 
via interfaces of less than 280 A^ (in ihcx-y plane) and 265 A^ 
(along the z-axis), respectively. The contacts between tetra- 
mers include a number of hydrogen bonds and six unique salt 
bridges and thus are qualitatively similar to those usually 
observed in typical crystal contacts. 

These packing considerations suggest that the tetramer 
emphasized in Fig. 1 represents the enzymatically active 
tetramer of human /3-tryptase. This tetramer selection is 
supported by the finding that the six loops that deviate most 
from the structures of other trypsin-like proteinases are all 
involved in forming monomer-monomer contacts within a 
tetramer. More important, this unique tetramer perfectly 
explains the distinguishing properties of tryptase in solution, 
e.g., the resistance to proteinaceous inhibitors other than 
LDTI, the unusual substrate specificity, and the stabilization 
by the binding of heparin-like glycosaminoglycans (see below). 

Overall Tetramer Structure. In the tryptase tetramer, 
monomers (arbitrarily assigned A, B, C, and D in Fig. 2) are 
positioned at the corners of a flat rectangular frame leaving a 
continuous central pore. The tetramer displays almost perfect 
222 symmetry that, however, is not exact because of the 
crystallographically asymmetric environment and an imperfect 



internal packing (see below). The horizontal and the vertical 
2-fold axes, which cross each other in the center of the 
tetramer, relate monomers A to B and C to D, or A to D and 
B to C, respectively. The third 2-fold symmetry axis relating 
monomers A to C and B to D is arranged virtually perpen- 
dicular to the other 2-foId axes and runs almost through their 
point of intersection in the central pore. 

The active centers of the four monomers are directed toward 
the central pore (Fig. 2). This pore exhibits a rectangular cross 
section and is twisted by «=30** about the tetramer axis. It 
possesses two narrow openings of dimension 40 A X 15 A, and 
widens in its central part to a cross section of 50 A X 25 A, just 
large enough for elongated peptides of the diameter of an 
a-helix to thread though the exits and to interact with the 
active sites. Both pore entrances are partially obscured by the 
147-loops (see below), which project from each of the mono- 
mers but on alternative entrance sides, so that only two 
diagonally arranged active centers can be viewed directly (Fig. 
2). With 33 basic (including 12 His residues) and 24 acidic 
residues per monomer, human tryptase exhibits an average 
percentage of charged residues comparable to related serine 
proteinases, but is only slightly positively charged at neutral 
pH. Tliese charges are not evenly distributed along the mo- 
lecular surface, however. Rather, negatively charged residues 
cluster preferentially on the inner pore-facing surface, con- 
ferring the pore with a quite negative electrostatic potential, 
and along the peripheral A-D (and B-C) edges. In contrast, 
the A-B (and C-D) peripheries and one front side of the 
monomer surface are positively charged and probably are 
involved in heparin binding (see below and Fig. 6). 

Monomer Structure. The tryptase monomer exhibits the 
typical 0-strand-dominated fold seen in other trypsin-like 
serine proteinases. The core is made by two six-stranded 



10986 Colloquium Paper: Sommerh 



Proc. Nll^^ad. ScL USA 96 (1999) 




Fig. 2. Overall structure of the tryptase tetramer. The four 
monomers A, B, C, and D (clockwise) are shown as blue, red, green, 
and yellow ribbons, each surrounded by a semitransparent surface. The 
inhibitor molecules APPA are given as orange CPK models, each 
binding into one of the four SI specificity pockets, 

/3-barrels that are packed together and further clamped by 
three transdomain segments (Fig. 3). This core structure is 
covered by a number of polypeptide loops, a short a-helical 
turn (AIa-55-Gly-66, not shown in Fig. 3g), and two regular 
a-helices, the so-called "intermediate helix" (Glu-164-Leu- 
173A) and the C-terminal helix (Arg-230-Val-242). The cata- 
lytic residues Ser-1 95, His-57, and Asp-102 (chymotrypsinogen 
numbering) are located in the junction between both barrels. 
The active-site cleft runs perpendicular to this barrel junction. 
In the "standard orientation" shown in Fig. 3, this cleft runs 
approximately horizontally across the molecular surface facing 
the viewer and is ready to accommodate and bind extended 
peptide substrates extending from left to right. One hundred 
sixty-two and 168 residues of the tryptase monomer are 
topologically equivalent to the archetypal proteinases chymo- 
trypsin (26) and trypsin (27), respectively, with an rms devi- 
ation of their a-carbon atoms of 0.65 A for both comparisons. 
The numbering of the tryptase residues given in this article is 
predominantly based on the equivalence with chymotryp- 
sinogen (28) and at only a few trypsin-characteristic sites on 
that with trypsin (27). 

In detail, however, the topology of the tryptase monomers 
deviates significantly from these reference proteinases (Fig. 
3b), probably more than any other trypsin-like serine protein- 
ase. In particular, sbc surface loops that border and shape the 
active-site cleft are unique (Fig. 3a). These loops comprise the 
147-loop (including the 152-"spur"), the 70- to 80-loop, the 
37-loop, the 60-Ioop, the 97-loop, and the 173-loop (Fig, 3a). 
The 147-loop, which together with Gln-192 forms the rather 
acidic southern wall of the active-site cleft, is shortened by one 
residue in its initial part, but contains a two-residue insertion 
(Pro-152-Pro-152A-cisPro-152B-Phe.l53-Pro-154) in its 
proline-rich and hydrophobic 152-spur. The neighboring 70- to 
80-loop to the east, which in the calcium-binding serine 
proteinases winds around a stabilizing calcium ion (27), is 
three residues shorter and more compact in tryptase. It is 
probably not designed for calcium binding, in spite of topo- 
logically similar liganding groups; Glu-70 and Asp-80, involved 
in a partially buried salt bridge cluster with Arg-34, are 



T0-«0 loop 




Fig. 3. The tryptase monomer in standard orientation, i.e., as seen 
approximately from the middle of the central pore of the tetramer 
toward the active site of monomer A (represented by Ser-1 95, His-57, 
and Asp-102). {a) Ribbon representation of a tryptase monomer. The 
amidino group of the APPA molecule interacts with Asp-189 in the SI 
pocket, Ser-195 O-7 is bound covalently to the APPA carbonyl group 
forming a hemiketal. The six unique surface loops of tryptase that 
surround the active site and are engaged in intermonomer contacts are 
shown in special colors, namely (anticlockwise) the 147-loop (light 
blue), the 70- to 80-loop (yellow), the 37-loop (orange), the 60-loop 
(magenta), the 97-ioop (green), and the 173-flap (red). All other 
tryptase segments are given in dark blue. The side chains of the 
catalytic triad residues as well as Asp-143, Asp-145, and Asp-147 in the 
acidic 147-loop are shown as a ball-and-stick model, (b) Overlay of the 
structures of the tryptase monomer and bovine trypsin, both given as 
ropes. The color-coding of tryptase is as in a, whereas trypsin is shown 
in gray. The most relevant deviations from the trypsin backbone 
appear in the colored loop regions of tryptase. 

oppositely arranged to the two calcium-binding Glu residues in 
trypsin. The 37-loop, above the 70- to 80-loop, possesses two 
additional residues (Pro-37A and Tyr-37B), which bulge away 
from the loop axis. The adjacent 60-loop, with five inserted 
residues, turns away from the cleft abruptly to the north, where 
it kinks at cisPro-60A to approach the general main chain 
course of other serine proteinases. At position 69, a buried Arg 
replaces the Gly residue that is strictly conserved in most other 
homologous proteinases, allowing for a special conformation. 
Although the 97-loop, at the northern rim of the cleft, contains 
the same number of residues as other serine proteinases, it 
differs considerably in conformation. The N-terminal part is 
shortened by two residues between positions 96 and 97, thus 
placing AIa-97 in the position normally occupied by residue 99, 



Colloquium Paper: SommerKl^^rt/. 



Proc. N!l^m:ad. Set. USA 96 (1999) 10987 



whereas its C-terminal part makes an unusual extra helical turn 
before arriving at Asp-102. By far the largest insertion, with 
nine residues, occurs in the 173-loop. After the unusually long 
three-turn intermediate helix, the 10 residues from His-173 to 
Val-1731 form an exposed flap centered around the imidazole 
side chain of His-173. 

With 245 amino acid residues, the tryptase monomer pos- 
sesses 15 and 22 residues more than the B-chains of chymo- 
trypsin and trypsin, respectively. Compared with chymotryp- 
sinogen, most of these extra residues present in all tryptases 
known so far are inserted in the 37-loop (two residues), the 
60-ioop ( + 5), the 1 47-loop ( + 1 ), the 1 73-loop ( + 9), at position 
221 A (+1) and at the C terminus ( + 1), whereas the 70- to 
80-loop (-3) and the 214- to 220-loop (- 1, as in all trypsin-like 
serine proteinases) are shorter. On the reverse side, the largely 
hydrophobic cluster of four Trp residues (Trp-27, -29, -207, and 
-137) is noteworthy. Only the indole moieties of the latter two 
Trp are significantly exposed to the surface. At the C terminus, 
only the main chain atoms of the two penultimate residues 
Lys-244 and Lys-245 are well defined by electron density, while 
the C-terminal Pro-246 could not be located. The side chain of 
the single N-linked sugar attachment site in human )3II- 
tryptase, Asn-204, extends away from the molecular surface 
opposite to the active site. Some residual electron density 
exists distal to its carboxamide group, which is not large 
enough to account for a covalently linked sugar residue. 

As found in almost all trypsin-like serine proteinases [ex- 
cept, e.g., single-chain tissue type plasminogen activator (29)], 
the N-termina! lle-16-Val-17 segment is inserted in the Ile-16 
pocket, forming a solvent inaccessible salt bridge between its 
free Ile-16 a-amino group and the carboxylate group of 
Asp- 194. The formation of this salt bridge after activation 
cleavage creates a functional substrate recognition site by 
reorienting the Asp- 194 side chain from an external position in 
the zymogen, where it might hydrogen bond to a surface 
located His-40— Ser-32 pair forming the so-called "zymogen 
triad," to an internal position in the active molecule (30, 31). 
This reorientation restructures the surrounding "activation 
domain," which in trypsin(ogen) mainly includes the linings of 
the Ile-16 pocket and the SI specificity pocket (i.e., segments 
Ile-16-Gly-19, Tyr- 184 -Asp- 194, Gly-216-Asn-223, and Gly- 
142-Tyr-151), and the "oxyanion hole" formed by the amide 
groups of GIy-193 and Ser-195 (28, 32, 33). The single-chain 
zymogen and the activated monomer are adequately described 
by a two-state model, in which an inactive conformation is in 
equilibrium with an active form possessing a structured acti- 
vation domain (31). The partition between both forms depends 
on environmental conditions such as the endogenous free 
Ile-16-Val-17 N-terminal segment (34), free Tle-Val dipeptide 
(31), ligands in the substrate binding site (30, 36), or other 
effectors such as fibrin with respect to tissue plasminogen 
activator or tissue factor in the case of Factor Vila (29, 37). 
This conformational partition can be influenced by internal 
molecular groups that stabilize or destabilize one or the other 
state. Tryptase possesses the zymogen triad residues His-40 
and Ser-32, which would stabilize the zymogen state. In 
addition, the acidic residues Asp-143, Asp-145, and Asp-147 
arranged around the Ile-16 cleft could form a negatively 
charged anchoring site that could compete with the ne-16 
pocket for the Ile-16 a-amino group, thus destabilizing the 
structured active state of the tryptase monomer. Furthermore, 
some of the loops in contact with the activation domain of 
tryptase, such as the long 173-loop or the 70- to 80-loop, which 
has been shown to be strongly correlated with the equilibrium 
state in bovine elastase "subunit III" (38), could influence the 
structured state. The conformation of the tryptase 173-loop, 
probably held in place in the tetramer by contacts with 
monomer D, certainly has an effect on the stability of the 
integrated monomer. Interestingly, tissue factor, thought to 
support insertion of the N-terminal Ile-16 a-amino terminus of 



activated Factor Vila B-chain on complex formation (37), 
likewise binds to the 173-loop at the intermediate helix 
flank (39). 

Interfaces. All monomer-monomer contacts within the 
tetramer are realized via six loops arranged around the active 
center. These loops, emphasized by special colors in Figs. 3-5, 
differ fundamentally in their conformation and partly in size 
from those of other trypsin-like serine proteinases. Monomers 
A and B interact with one another through the 147-loop, the 
70- to 80-loop, and the 37-Ioop (Fig. Ad), Each 152-spur slots 
into a cleft formed by the 37- and the 70- to 80-loop of its own 
monomer and the 152-spur of the opposing neighbor. At the 
center of the interface, the side chains of Phe-153 and Tyr-75 
from each subunit form an approximate tetrahedron (Fig. 5a). 
The side chain of Tyr-75 from monomer B (D) would clash 
with the equivalent A (C) side chain if they were arranged in 
a symmetrical manner. Instead, the phenolic group of Tyr-75 
of monomer A turns in the opposite direction, breaking the 
2-fold symmetry (see the partial electron density in Fig. Sa). 
This A-B (C-D) interface is exclusively hydrophobic, with a 
remarkable number of Tyr and Pro side chains involved, and 
lacks any intermonomer hydrogen bonds. Toward the pore, the 
side chains of the two Arg-150 residues oppose one another. 
The charges of their guanidyl groups presumably make unfa- 
vorable energy contributions to the A-B interaction. 

Monomer A interacts with monomer D through the entire 
northern rim consisting of the 173-flap, the 97-loop, and the 
60-loop (Figs. Aa and 5^), again via equivalent loops. Both 
97-loops rest with their 95-99 segments on one another (Fig. 
Aa), with both Ile-99 side chains in direct contact. Further 
toward both peripheries, segment Pro-60A-Asp-60B and the 
opposing segment Gly-173B-Tyr-173D run antiparallel to one 
another, forming two-rung antiparallel ladders between Gly- 
173B-Tyr-173D and Pro-60A-Val-60C (Fig. 5b), Each Tyr-95 
aromatic side chain nestles into the bend of the opposing 
173-flap, and each Tyr-173D phenolic side chain slots into a 
hydrophobic cleft made by the 60-loop and the 97-loop of the 
opposing monomer. In addition, both monomers are cross- 
connected by salt bridges between Asp-60B and Arg-224 and 




Fig. 4. Loop arrangements in the tetramer. The six special loops 
engaged in monomer-monomer interactions are shown in the color 
coding introduced in Fig. 3. (a) The D-A dimer as seen from outside 
of the tetramer along the local 2-fold axis, (p) The monomer viewed 
in standard orientation, (c) Front view of the tetramer, (rf) The A-B 
dimer seen from outside of the tetramer along the local 2-fold axis. 



10988 Colloquium Paper: SommerhS^^P!://. 



Proc. NH^Kbad. Sci. USA 96 (1999) 





Fig. 5. Stick representation of the contact interfaces between monomers, {a) The AB-interface seen from inside the tetramer along the local 
2-fold axis, shown together with the final IFq—Fc electron density map for both Tyr-75 side chains contoured at I <r level. The monomers and loops 
are given in the color coding introduced in Figs. 3 and 4. (Jb) The AD-interface (half side) observed approximately perpendicular to the local 2-fold 
axis, shown together with all intermonomer hydrogen bonds and salt bridges (green dots). Segments of monomers A and D are given in blue and 
yellow, respectively. 



by four hydrogen bonds involving both main and side chains 
(Fig. 5h). Thus, the A-D (and the corresponding B-C) inter- 
face comprises a number of polar/charged interactions in 
addition to several hydrophobic contacts. 

The A-B homodimer carries a number of positively charged 
residues at the periphery, which cluster and form an obliquely 
oriented two-lobed patch of positive charges that extends 
toward one of the front sides of each monomer, giving rise to 
the blue-colored electrostatic potential surfaces in Fig. 6. With 
an overall length of almost 100 A, this patch would allow tight 
electrostatic binding of an extended heparin chain of *^20 
sugars running obliquely along the A-B edge as shown in Fig. 
6. The length of such heparin chains is in good agreement with 
the experimentally observed stabilization of the tetramer by 
heparin fractions of molecular mass 5,500 Da and above (40). 
On the peripheral surface of the A-D (and the corresponding 
B-C) homodimer, in contrast, positive charges are counter- 
balanced by adjacent negative ones. 

Interaction with Substrates and Inhibitors. The immediate 
vicinity of the tryptase active site is quite similar in structure 
to that of trypsin. The specificity 81 pocket, which opens to the 
west of the reactive Ser-195 (Fig. 3a), is virtually identical to 



that of trypsin and well suited to accommodate Pl-Lys and Arg 
side chains. The 4-amidinophenylpyruvic acid (APPA) mole- 
cule inserts into this pocket in the same manner as in the 
complex with trypsin (41). Thus, its amidino group hydrogen 
is bonded to both Asp- 189 carboxylate oxygens, Gly-219 O and 
Ser-190 O7, and its phenyl ring is sandwiched between peptide 
planes 215-216 and 190-192. Ser-195 O7 bonds to the carbonyl 
group of the letrahedral pyruvate part of APPA (Fig. 3a), and 
hydrogen bonds to His-57 Ne. As indicated by the relatively low 
equilibrium dissociation constant of the APPA-tryptase com- 
plex [Ki 0.71 /ulM; (42)], APPA fits well to the tryptase active 
site. Toward the south of the active site of tryptase, the side 
chains of Asp-143, Asp-145, and Asp-147 protrude from the 
relatively flat and hydrophobic southern embankment (Fig. 
3a). The resulting negative charge cluster provides a second 
anchoring point for dibasic synthetic tryptase inhibitors such as 
bis-benzamidines (17, 42, 43), allowing favorable interactions 
with a distal basic group such as in pentamidine. The structural 
basis of the unexpected high affinity of bifunctional inhibitors 
containing suitably arranged adjacent imidazole moieties such 
as present in the inhibitor BABIM and closely related ana- 
logues (43, 44) has recently been revealed: two nitrogen atoms 



Colloquium Paper: Sommerh^HP^/. 



Proc. Ndl^tad. Set. USA 96 (1999) 10989 




Ftg. 6- Model of the binding of a 20-mer heparin-likc glycosamino- 
glycan chain along the A-B edge of the tryptase-letramer. The 
solid-surface representation of tryptase indicates positive (blue) and 
negative (red) electrostatic potential contoured from —4 kT/e to 4 
kT/e. The heparin chain (green/ye!low/red stick model) is long 
enough to bind to clusters of positively charged residues on both sides 
of the monomer-monomer interface, thereby bridging and stabilizing 
the interface which is exclusively hydrophobic in nature (see Fig. 5a). 

of the two methylene-connected benzimidazoles coordinate a 
zinc ion that also binds to the active-site located Ser-195 Oy 
and His-57 Ne (44). The zinc-mediated binding enhancement 
of BABIM-like inhibitors is particularly large but not restricted 
to tryptase. 

Toward the east, the substrate-binding site of tryptase is not 
only bounded by the side chains of Tyr-37B and Tyr-74 of 
monomer A, bul also by the Phe-153 benzyl group and the 
152-spur of the neighboring monomer B. Thus, binding of 
extended substrate chains is limited to about P5' (Fig. 7). 




Fig. 7. View from the LDTI inhibitor (represented only by its 
reactive site loop P7 to P3') toward the active-site cleft. The PI Lys 
residue is buried. 



Toward the north, the 97-loop of monomer A borders the 
substrate binding region in a manner different from most other 
serine proteinases, and together with the side chains of Phe-94, 
Ala-97, and Gln-98 of monomer D forms a projecting "can- 
opy." The S2 subsite underneath is open and larger than that 
of trypsin. The S3/S4 subsite above the Trp-215 indole moiety 
is fully blocked by the side chain of Gln-98 and the phenolic 
group of Tyr-95 provided by monomer D. Toward the west, 
however, the substrate-binding site is bordered exclusively by 
segments of the D-monomer, in particular the His-57 imida- 
zole ring and segment 57-60. Thus, the active centers of 
monomers A and D (B and C) are spatially close (distance ^23 
A for the A-D pair) to each other in the tryptase tetramer, 
rendering the tryptase tetramer suitable for the specific bind- 
ing of bifunctional inhibitors with relatively short spacers. 

The central pore of tryptase restricts the size of accessible 
substrates and inhibitors considerably. For larger proteins such 
as fibronectin and the zymogens of stromelysin-1 and uro- 
kinase-type plasminogen activator, the cleavage sites must be 
extended into the active sites. Docking experiments with 
C-terminaliy truncated prostromelysin-1 (45) and with single- 
chain tissue plasminogen activator (29) as a model for 
prourokinase show that the activation cleavage loops of these 
proproteinases must be extracted from their crystal structures 
to allow binding in the tryptase active center. More flexible 
peptides, in contrast, could easily thread through the pore of 
the tetramer to be processed or destroyed. Flexible polypep- 
tide chains with two distant basic residues, as in "vasoactive 
intestinal peptide" (18), might even dock to adjacent active 
sites simultaneously to produce fragments of distinct length. 

The active centers of the tryptase monomers are also largely 
inaccessible for macromolecular inhibitors. The only exception 
known is LDTI, an "atypical" Kazal-type inhibitor that is 
smaller than the classical members of this family (16). LDTI 
has been shown to bind to trypsin through its reactive-site loop 
(residues P4 to P4') in a canonical manner (17, 46). In the 
model of the complex with tryptase monomer A, the four 
N-terminal residues preceding this binding segment could 
bend toward the south (with respect to Figs. 3 and 7), leading 
to the juxtaposition of the basic Lys-ll-Lys-I2 amino terminus 
(with the suffix I identifying inhibitor residues) with the 
carboxylate groups of Asp-143 and Asp- 147 of monomer A. 
Alternatively, the two Lys residues could interact with Asp- 
60B of molecule D. The involvement of such electrostatic 
interactions is supported by the deleterious effect of deletions 
and substitutions of these basic residues on the affinity of 
LDTI toward tryptase but not trypsin (17). The LDTI reactive- 
site loop, running from Cys-114 (P5) to Pro-122 (P4'; ovomu- 
coid numbering), is relatively small compared with classical 
Kazal-type inhibitors, allowing good overall fit to the restricted 
substrate binding groove (Figs. 7 and Sa), Furthermore, its 
central helix is one turn shorter, so that it just fits into the 
central pore of the tetramer on canonical binding to the active 
site of monomer A with only a few narrow contacts of its 
molecular antipole, opposite to its reactive-site loop, with the 
147-loop of monomer D. Docking of a second LDTI molecule 
is possible at the opposite active centers of either monomer B 
or monomer C (Fig. 8a). A slight collision between Cys-156 and 
Gly-128 of two bound LDTI molecules could be relieved by 
minor torsion in the proteinase-inhibitor interfaces, as ob- 
served for other canonically binding inhibitors such as eglin c 
(46). Any such torsion in the LDTI molecule bound to 
monomer A would impose an opposing torsion in the LDTI 
molecule bound to monomer B, facilitating such a relaxation. 
The simultaneous binding of two LDTI molecules to the 
tetramer is in good agreement with experimental results 
showing «50% inhibition of the cleavage activity toward small 
chromogenic substrates by nanomolar LDTI concentrations 
(16). Modeling experiments with more elongated classical 
Kazal-type inhibitors or with the prototypical bovine pancre- 



10990 Colloquium Paper: Sommerh 




Proc. Ni 



ad, Sci. USA 96 (1999) 




Fig. 8. Models of the interaction of the human tryptase tetramer with proleinaceous inhibitors. The tryptase tetramers are shown as green 
ribbons. An inhibitor molecule (blue) is modeled into the active site of monomer A by superposition of the proteinase moiety of known 
proteinase-inhibitor complexes to a tryptase monomer. For LDTI and BPTI the target proteinase was trypsin (17, 49), for MPI chymotrypsin (47). 
The active sites of the other tryptase monomers are occupied by APPA molecules (orange). Parts of the inhibitors clashing with the structure of 
tryptase (i.e., a distance smaller than 1 .5 A between the Ca-atoms of the respective molecules) are highlighted in red. (a) In addition to one molecule 
of the **atypical" Kazal-type inhibitor LDTI bound to the tryptase monomer A a second molecule (shown in pink and yellow) can bind to the active 
site of either monomer B or C. (b) Bovine pancreatic trypsin inhibitor (aprotinin), (c) Human mucous proteinase inhibitor bound to tryptase with 
its inhibitorily active second domain. 



atic trypsin inhibitor indicate strong collisions of their distal 
pole segments with the neighboring monomers D and B, in 
particular with the 147-loops, explaining the observed inac- 
tivity of these inhibitors toward tryptase (Fig. 8^>). The central 
portion of the two-domain mucous proteinase inhibitor 
(MPI = SLPI = HUST-T) would clash with the A-D interface 
region of the tryptase tetramer if bound to the active site of 
monomer A (Fig. 8c) via its inhibitorily active second domain 
(47). Similarly, elafin (= SKALP), an inhibitor corresponding 
to the MPI second domain (48), should not be able to inhibit 
tryptase. The much larger plasma proteinase inhibitors are 
clearly far too bulky to fit into the narrow pore of the tryptase 
tetramer and gain access to one of the active centers. 

CONCLUSION 

In summary, the structure of the )3II-tryptase tetramer has 
been identified based on the four crystallographically inde- 
pendent quasiidentical monomers and the analysis of their 
arrangement within the crystal packing. With its frame-like 
architecture and its active centers facing a narrow central pore, 
the resulting tryptase tetramer structure explains most of the 
distinct properties of the biologically active tryptase tetramer 
in solution. The unusual substrate specificity, with a preference 
for peptidergic substrates, and the resistance to proteinaceous 
inhibitors other than LDTT are both caused by the limited 
accessibility of the active sites within the narrow central pore. 
The tetramer can be stabilized by heparin glycosaminoglycan 
chains larger than «*20 sugar residues, a length required to 
bridge the weaker of the two distinct monomer-monomer 
interfaces. The loss of enzymatic activity on dissociation of the 
tetramer is caused by stabilization by internal molecular 
groups of a zymogen-like rather than the active stale. Finally, 
the knowledge of the structure of the active center of the 
monomer as well as of the distances between neighboring 
active sites allows the rational design of multifunctional inhib- 
itors. Such inhibitors that bind to more than one active center 
will ideally have potentiated affinity, conferring selectivity for 
the tryptase tetramer. Such inhibitors will be valuable as 
pharmacological tools to probe the pathophysiological func- 
tion(s) of tryptases in vivo and may have therapeutic potential 
against asthma and other mast-cell related disorders. 



We are grateful to R. Huber and H, Fritz for their generous support. 
We thank D. Grosse and R. Mentele for their excellent help in 
crystallization and amino acid sequence analysis. This work was 
supported by Sonderforschungsbereich 469 of the University of Mu- 
nich, the Deutsche Forschungsgemeinschaft (STU 161, BO 1279), the 
Fonds der Chemischen Industrie, and programs BIO4-CT98-0418 and 
TMR ERBFXCT 98-0193 of the European Union. 

1. Miller, J. S., Westin, E. H. & Schwartz, L. B. (1989)7. Clin. Invest. 
84, 1188-1195. 

2. Pallaoro, M., Fejzo, M. S., Shayesteh, L., Blount, J. L. & Caughey, 
G. H. (1999) /. Biol. Chem. 274, 3355-3362. 

3. Schwartz, L. B., Irani, A, M„ Roller, K., Castells, M. C. & 
Schechter, N. M. (1987) /. ImmiinoL 138, 2611-2615. 

4. Xia, H. Z., Kepley, C. L., Sakai, K., Chelliah, J., Irani, A. M. & 
Schwartz, L. B. (1995)/. ImmunoL 154, 5472-5480. 

5. Schwartz, L. B., Sakai, K., Bradford, T, R., Ren, S., Zweiman, B., 
Worobec. A. S. & Metcalfe, D. D. (1995) J. Clin. Invest, 96, 
2702-2710. 

6. Sakai, K., Ren, S. & Schwartz, L. B. (1996)7. Clin. Invest. 97, 
988-995. 

7. Caughey. G. H. ( 1 997) Am. J. Respir. Cell Moi Biol. 16, 621-628. 

8. Johnson, P. R. A., Ammit, A. J., Carlin, S, M., Armour, C. L., 
Caughey, G. H. & Black, J. L. (1997) Eur. Respir. J, 10, 38-43. 

9. Rice, K. D., Tanaka, R. D., Katz, B. A., Numerof, R. P. & Moore, 
W. R. (1998) Curr, Pharm. Des, 4, 381-396. 

10. De Sanctis, G. T., Merchant, M., Beier, D. R., Dredge, R. D., 
Grobholz, J. K., Martin, T. R., Lander, E. S. & Drazen, J. M. 
(1995) Nat, Genet. 11, 150-154. 

11. Hunt, J. E., Stevens, R. L,, Austen, K. F., Zhang, J., Xia, Z. & 
Ghildyal, N. (1996)7. Biol. Chem. Ill, 2851-2855. 

12. Schwartz, L. B. (1994) Methods Enzymol 244, 88-100. 

13. Caughey, G. H. (1995) Mast Cell Proteases in Immunology and 
Biology (Dekker, New York). 

14. Ren, S., Sakai, K. & Schwartz, L. B. (1998) J. Immunol. 160, 
4561-4569. 

15. Selwood. T., McCaslin, D. R. & Schechter, N. M. (1998) Bio- 
chemistry 2,1 , 13174-13183. 

16. Sommerhoff, C. P., Sollner, C, Mentele, R., Piechottka, G. P., 
Auerswald, E. A. & Fritz, H. (1994) Biol. Chem. Hoppe-Seyler 
375, 685-694. 

17. Stubbs, M. T., Morenweiser, R., Stiirzebecher, J., Bauer, M., 
Bode, W., Huber, R., Piechottka, G. P., Malschiner, G., Som- 
merhoff, C. P., Fritz, H., et al, (1997) 7. Biol Chem, 272, 
19931-19937. 



Colloquium Paper: Sommerhoi^^a/. 



Proc. NS^^ad, Sci. USA 96 (1999) 10991 



18. Tarn, E. K. & Caughey, G. H. {\990)Am. J. Respir. Cell MoL Biol. 
3, 27-32. 

19. Gruber, B. L., Marchese, M. J., Suzuki, K., Schwartz, L. B., 
Okada, Y., Nagase, H. & Ramamurthy, N. S. (1989) J. Clin, 
Invest. 84, 1657-1662. 

20. Stack, M. S. & Johnson, D. A. (1994) J. Biol. Chem. 269, 
9416-9419. 

21. Molino, M., Barnaihan, E. S., Numerof, R., Clark, J., Dreyer, M., 
Cumashi, A., Hoxie, J. A., Schcchter, N., Woolkalis, M. & Brass, 
L. F. (1997)/ Biol. Chem. 272, 4043-4049. 

22. Lohi, J., Harvima, I. & Keski-Oja, J. (1992) J. Cell. Biochem. 50, 
337-349. 

23. Little, S. S. & Johnson, D. A. (1995) Biochem. J. 307, 341-346. 

24. Schwartz, L. B., Bradford, T. R., Litlman, B. H. & Wintroub, 
B. U. (1985)7. Immunol. 135, 2762-2767. 

25. Pereira, P. J., Bergner, A., Macedo-Ribeiro, S., Huber, R., 
Matschiner, G., Fritz, H., Sommerhoff, C. P. & Bode, W. (1998) 
Nature (London) 392, 306-311. 

26. Blevins, R. A. & Tulinsky, A. (1985) / Biol. Chem. 260, 4264- 
4275. 

27. Bode, W. & Schwager, P. (1975) / Moi Biol. 98, 693-717. 

28. Wang, D., Bode, W. & Huber, R. (1985) J. Mol. Biol. 185, 
595-624. 

29. Renatus, M.. Engh, R. A., Stubbs, M. T., Huber, R., Fischer, S., 
Kohnert, U. & Bode, W. (1997) EMBO J. 16, 4797-4805. 

30. Huber, R. & Bode, W. (1978) Acc. Chem. Res. 11, 1 14-122. 

31. Bode, W. (1979) / Mol. Biol, ill, 357-374. 

32. Freer, S. T., Kraut, J., Robertus, J. D., Wright, H. A. T. & Xuong, 
N. H. (1970) Biochemistry 9, 1997-2009. 

33. Bode, W., Fehlhammer, H. & Huber, R. (1976)/. Mol. Biol 106, 
325-335. 

34. Hedstrom, L., Lin, T. Y. & Fast, W. (1996) Biochemistry 35, 
4515-4523. 



35. Bode, W., Schwager, P. & Huber, R. (1978) / Mol. Biol. 118, 
99-112. 

36. Bolognesi, M., Gatti, G., Menagatti, E., Guarneri, M., Marquart, 
M., Papamokos, E. & Huber, R. (1982) J. Mol. Biol. 162, 
839-868. 

37. Higashi, S. & Iwanaga, S. (1998) Int. J. Hematoi 67, 229-241. 

38. Pignol, D., Gaboriaud, C, Michon, T., Kerfelec, B., Chapus, C. 
& Fontecilla Camps, J. C. (1994) EMBO J. 13, 1763-1771. 

39. Banner, D. W., D*Arcy, A., Chene, C, Winkler, F. W., Guha, A., 
Konigsberg, W. H., Nemerson, Y. & Kirchhofer, D. (1996) 
NaUire (London) 380, 41-46. 

40. Alter, S. C, Metcalfe, D. D., Bradford, T. R. & Schwartz, L. B. 
(1987) Biochem. J. 248, 821-827. 

41. Walter, J. & Bode, W. (1983) Hoppe-Seylers Z. Physiol. Chem. 
364, 949-959. 

42. Sturzebecher, J., Prasa, D. & Sommerhoff, C. P. (1992) Biol. 
Chem, Hoppe-Seyler 373y 1025-1030. 

43. Caughey, G. H., Raymond, W. W., Bacci, E., Lombardy, R. J. & 
Tidwell, R. R. (1993)/ Pharmacol. Exp. Titer 264, 676-682. 

44. Katz, B. A., Clark, J. M., Finer Moore, J. S., Jenkins, T. E., 
Johnson, C. R., Ross, M. J., Luong, C, Moore, W. R. & Stroud, 
R. M. (1998) Namre (London) 391, 608-612. 

45. Becker, J. W., Marcy, A. I., Rokosz, L. L., Axel, M. G., Burbaum, 
J. J., Fitzgerald, P. M., Cameron, P. M., Esser, C. K., Hagmann, 
W. K., Hermes, J, D., et al. (1995) Protein Sci. 4, 1966-1976. 

46. Bode, W. & Huber, R. (1992) Eur J. Biochem. 204, 433-451. 

47. Grutter, M. G., Fendrich, G., Huber, R. & Bode, W. (1988) 
EMBO J. 7, 345-351. 

48. Tsunemi, M., Matsuura, Y., Sakakibara, S. & Katsube, Y. (1996) 
Biochemistry 35, 11570-11576. 

49. Huber, R., Kukia, D., Bode, W., Schwager, P., Bartels, K., 
Deisenhofer, J, & Steigemann, W, (1974)/ Moi Biol 89, 73-101. 





Exhibit 35 



The Three-Dimensional Structure of Asn$'^{102}$ Mutant of Trypsin: Role 
of Asp$'^{102}$ in Serine Protease Catalysis 




® 



S. Sprang; T. Standing; R. J. Fletterick; R. M. Stroud; J. Finer-Moore; N-H. Xuong; R. 
Hamlin; W. J, Rutter; C. S. Craik 

Science, New Series, Vol. 237, No. 4817 (Aug. 21, 1987), 905-909. 
Stable URL: 

http://Iinksjstor.org/sici?sid=0036-8075%2819870821%293%3A237%3A48I7%3CW 
Science is currently published by American Association for the Advancement of Science. 



Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at 
htlp://www.jstor.org/aboui/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you 
have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and 
you may use content in the JSTOR archive only for your personal, non-commercia] use. 

Please contact the publisher regarding any further use of this woilc. Publisher contact information may be obtained at 
http://www.jsior.org/joumals/aaas.html- 

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or 
printed page of such transmission. 



JSTOR is an independent not-for-profit organization dedicated to creating and preserving a digita) archive of 
scholarly journals. For more information regarding JSTOR. please contact support@jstor.org. 



h ttp://ww w.j stor.org/ 
Thu Sep 9 18:09:41 2004 



\ 



15. Host! were mounted in 1.34- and 1.00-mm diamc- 
vcr holes tn white plastic squares (2 by 2 cm). Each 
wasp was allowed to complccc examination and 
ovipositkm. The wasps «%re obsovcd individiuUy 
to prevent r ep eated pansitization of the sante host. 
Triab in which the wasp kft chc host before com- 
pleting oviposition were rtjected. 

16. Atean ± SD was used thnmghout. Statistical signifi- 
cance was determined by t ccsts. 

17. S. E. FJandcn. Ptm-Pae. Entomol. 11. 175 (1935). 

18. Head length was measured from the medial ocellus 
CO the tip of the closed rmndibics by using an ocular 



micrometer. Wasps differed ngni5cantty in mean 
head length between Urge and small treatment 
groiros {P < 0.001). 

19. Sing^ hosts were mounted on white cardboard 
squares (2 by 2 cm) with gum arabic. After host 
examination was completed, wasps were obsctvcd as 
in {IS). 

20. Measurements made from films of the Initial transit 
demonstrate a significant linear relation between 
waq) body length and stride length [slope, 0.58 ± 
0.064 (SE); « • 15. /» < 0.01]. 

- 21. Wasps were observed on single hosts mounted on 



cardboard cards with gum arabic. Only wasps that 
completed their host examination and began ovipos- 
iting were included in the data. For details of 
methods and results, sec ). M. Sdunidt and J. J. B. 
Smidi [/. Exp. Biol. 129, ISl (1987)]. 

22. Leptdoptera: Gdechiidac. 

23. Wc thank IL Tanner aiwJ R. Chaplinsky for technical 
assbtancc. The Natural Sciences and Engineering 
Rescardi Council of Canada provided financial sup- 
port. 

4 Match 1987; accepted I 'Junc 1987 



The Three-Dimensional Structure of Asn*^^ Mutant of 
Trypsin: Role of Asp*®^ in Serine Protease Catalysis 



S. Sprang,* T. Standing, R. J. Fletterick, R. M. Stroud, 
J. Finer-Moore, N-H. Xuong, R. Hamlin, W. J. Rutter, 
C. S. Craik 

The structure of the Asn'**^ mutant of trypsin was determined in order to distinguish 
whether the reduced activity of the mutant at neutral results &om an altered active 
site conformation or from an inability to stabilize a positive char^ on the active site 
histidine. The active site structure of the Asn'*" mutant of trypsin is identical to the 
native enzyme with respect to the specificity pocket, the oxyanion hole, and the 
orientation of the nudeophilic serine. The observed decrease in rate results from the 
loss of nudeophilicity of the active site serine. This decreased nudcophilicity may result 
from stabilization of a His'^ tautomer that is unable to accept the serine hydroxyl 
proton. 



THROUGHOUT THE DIVERSE FAMILY 
of serine proteases, the three residues 
implicated in the bond breaking and 
making events of protease catalysis, His^', 
Asp'**^, and Scr"' (chymotrypsin number- 
ing system) arc conserved. The spatial rela- 
tion among dicsc residues is virtually equiv- 
alent in the three-dimensional structures of 
all serine proteases studied. The catalytic 
roles of Ser"^ and His^ arc firmly estab- 
lished (i). The substrate (ester or amide) 
carbonyl carbon undergoes a nucteophilic 
attack by the hydroxyl group of Scr'**, 
which leads to the formation of an acyl 
enzyme intermediate. His" functions as a 
catalytic base by assisting in the transfer of a 
proton from the serine hydroxyl to the 
substrate leaving group. The role of Asp'**^ 
has not yet been defined. The three func- 
tions proposed for this residue arc: (i) stabi- 
lizing the His^' conformation that is re- 
quired for catalysis (2), (ii) stabilizing the 



S. Sprang. T. Standing, R. ). Flcttcridt, R. M. Stroud, J. 
Fuicr-Moore, Deporanent of Biochcnitstry and Biophys- 
ics, Univosity of California, San Francisco, San Francis- 
co, CA 94143. 

N-H. Xuong and R. Hamlin, Department of Physics, 
Univcnity of Caltfomia, San D^cgo, La )oUa, CA 92093. 
W. }. Rutter, HonixHK Research Institute, University of 
California, San Francisco, San Francisco, CA 94143. 
C S. Craik, Dcportmcni of Btochcmistiy and Biophysics 
and Department of Pharmaceutical Chemistry, Universi- 
ty of California, San FrarKisoo, San Francisco, CA 
94143. 

♦Prrjcm address: Howard Hutdtcs Medical Insdtutc, 
Univenity of Texas, Dallas, TX 75235. 



appropriate His tautomer (2), and (iii) 
stabilizing the posidvcly charged hisddinc 
that forms during the reaction (3). The 
proposed functions were tested with a ge- 



netically engineered mutant of the anionic 
isozyme of rat trypsin that was constructed 
by replacing Asp'*" with an asparaginc (4), 
designated here as D 102 N trypsin, where 
D is Asp and N is Asn. 

The activity of D 102 N trypsin has been 
studied as a function of pH {4). The activity 
of this mutant enzyme toward a variety of 
substrates is reduced by four orders of mag- 
nitude relative to trypsin between 7 and 

9, where the latter is optimally active. 
The Michaelis constant, K^^ of the mutant 
enzyme is virtually unaffected {4). This 
raises the quesdon of whether the chemical 
properties of the asparagine itself or the 
conformational differences in the enzyme 
arc responsible for the loss of activity in 
D 102 N trypsin. To address this point, we 
describe the three-dimensional structure of 
D 102 N trypsin at both 6 and 8. 

Orthorhombic crystals (space group 
P2t2i2i) of rat D 102 N trypsin grown at 

6 in the presence of benzamidinc were 




Fig. 1. An a -carbon diagram (stereoscopic) of aniotiic nt D 102 N trypsin at 6 {9-12) (green) is 
superimposed on bovine trypsin (blue). Residues in rat trypsin (72) that differ in sidc*chain type from 
corrcspondingrcsiducs in the bovine sequence (25) arc highlighted in rod here. Side-chain positions for 
residues Asn , His^^, and Scr''^ arc also shown in red. The root-mean-squarc (mis) difference in 
position between corresponding atoms of D 102 N rat trypsin in the crystals grown at 6 and bovine 
trypsin {13, 26) after least-squares superposition is 0.47 A for all main-chain atoms and 0.67 A for all 
side-chain atoms. Vahies quoted arc the average of those obtained for molecules I and 2 in the 
asynunctric unit of the D 102 N trypsin crystals grown at pH 6. The ccxnputcd rms distance may be an 
underestimate of the true differences in the two structures because of the use of bovine trypsin as chc 
initial phasing model. The rms difference after superposition between all atoms of Uk two molecules in 
the asynunctric unit is 0.21 A. The rms deviation between the main -chain atoms of the 6 and 8 
cryscai forms of D 102 N trypsin is 0.25 A. 



21 AUGUSr 1987 



REPORTS 90s 



s 



obtained by vapor difiusion <^inst polyeth- 
ylene glycol (Figs. 1 and 2» top). Diffraaion 
data were measured to 2.3 A resolution with 
monochromatic copper Ka radiation and 
the crystal cooled to 4*C on a multiwire area 
dctcaor with the procedures described by 
Xuong ct al. (5) (Tabic 1). A cubic crystal 
form (space group J23) was obtained at pH 
8 by vapor diffusion against magnesium 
sulfate. Diffraction data for this form were 
recorded to 2.8 A resolution with mono- 
chromatic copper Ka radiation on a difirac- 
tomctcr (7) (Table 1 and Fig. 2, middle). 
Both crystal structures were determined by 
molecular replacement methods (5) and re- 
fined by stcreochemicalJy restrained minimi- 
zation of the differences between observed 
and computed structure amplitudes {6, 9- 
12) (Table 1 and Figs. 1 and 2). 

The tertiary structures of the mutant rat 
anionic trypsin at both pW 6 and 8 arc 
essentially identical to that of the bovine 
enzyme (7, 13). The largest differences be- 
tween the enzymes from rat and cow are 
localized to four segments in the NH2 termi- 
nal domain, ail outside the p core, where 
deviations between corresponding main 
chain atoms exceed 1.0 A (Fig. 1). The 
structural similarity between t> 102 N tryp- 



sin and bovine trypsin is quite high in the 
neighborhood of the active site; no signifi- 
cant differences in the relative positions 
(<0.3 A) (Table 2) or relative diamai feccres 
arc observed for Asn'*", Scr***, or the 
oxyanion binding site (i4); that is, the main- 
chain amide gioups of residues 193 and 
195. The only exception occurs in crystals 
grown at 6, where the side chain of 
Flis'' is statistically disordered (Fig. 2, top) 
(11, 12)^ and is partitioned between the 
gauche conformation observed in native 
trypsin and an alternative trans conforma- 
tion, in which the imidazole side chain is 



\ 



I 

\ 



Fig. 2. (Top) The difference Fourier map 
(fob. - fc^c) at the cataiyTic site of D 102 N rat 
trypsin at pH 6. The side-chain atoms of His" 
were omitted from the calculated structure factors 
and phases. The trans and gauche conformations 
of the histidine side chain related by torsional 
differences of 70* are superimposed on the elec- 
tron density. The difference electron density is 
shown at a contour tevcl of 0.2 elearon per cubic 
angstrom. The map extends over all atoms siiown 
in the figure. No negative density is present in this 
region at the 0.2 electron per cubic an^trom 
level. Two lobes of flat, ellipsoidal density arc 
evident, both continuous with the density corre- 
sponding CO the C0 atom of His^^. The peaJcs arc 
of unequal magnitude; the stronger peak is locat- 
ed within the active site between the side chains of 
Asn'** and Scr*'* at a position coincident with 
His^' in the structures of bovine trypsin, and the 
second weaker peak is outside of the active site 
pocket. The shape of both lobes of density and 
their proximity to the CP atom of His" rules out 
the assignment of either peak to ordered solvent. 
(Middle) A difference Fourier map {Fgb* ~~ ^ oic) 
showing the catalytic site of D 102 N trypsin 
firom crystals grown at pH 8. The side-chain 
atoms of His^^were omined from the calculated 
structure factors and phases. At this ^H, only the 
gauche conformcr for His*' is observed in the 
difference electron density. The histidine confor- 
mation is almost identical Co that observed in 
bovine trypsin— bcnzamidinc complex (7). The 
structure of D 102 N trypsin at ^H 8 was 
determined by nwlecular replacement, using the 
refined structure at pH 6 as a search model. The 
side -chain atoms of Asn'°^, His'', and Ser'*' as 
wcU as solvent, bcnzamidinc, and calcium ion 
atoms were omitted from this model. The rota- 
tion funaion produced only one significant peak 
and was evaluated with all data to 2.8 A and an 

po6 



int^ration radius from 4.0 to 16 A. The R factor 
at the correct translation position was 0.35. A 
difference Fourier map computed with phases 
from che molecular replacement solution revealed 
the positions of the omitted side chains, calcium 
ion, and bcnzamidinc molecule. These were in- 
cluded in the phasing model and the structure was 
subjected CO 23 cycles of stercochemically re- 
straiiicd crystallographic refinement (Table 1) (6). 
(Bottom) The bovine trypsin structure (thin 
lines) is superimposed on that of D 102 N rat 
trypsin crystallized at pH 6.0 (thick lines). Both 
conformers of His*^ in D 102 N rat trypsin are 
shown. 



displaced from the active site toward the 
solvent. Only the native gauche His" con- 
formation is obsctvcd in crystals grown at 
8. Unless otherwise stated, all references 
to His'' in the following discussion refer to 
the native conformcr. 

In both the pH 8 and 6 crystal fomis, 
Asn'**^ is superimposable within experimen- 
tal error with Asp***^ of the bovine enzyme 
(Fig. 2). In trypsin, one of the carboxylatc 
oxygen atoms of Asp'*"^ accepts hydrogen 
bonds from the main-chain amide groups of 
residues 56 and 57, and the second accepts 
hydrogen bonds from both the N51 atom of 




SCIENCB, VOL. 137 



56 N , 



\ 57 

N 



H 



Sor2« 



H 
\ 
O 



H 



B 



\ 57 
N. 



H 



Asn'« 



Ser>" 



H 



Fig. 3. (A) In the hydrogen bond network found 
in D 102 N trypsin above ncuuaJ pH, His*' is 
unable to accept a proton from Scr"^ 08. The 
orientation of the hydrogen bond between Hb" 
and Scr"^ is the reverse of that observed in the 
bovine tryp$in-bcn2aniiidinc structure (7). (B) In 
the hydrogen bond network of wild- type trypsin. 
His' is an acceptor for the proton from Scr*'. 



Table 1. Crystal and difiiraction data for D 102 N 
trypsin. TTic diffraction data for the crystals grown 
at pH 6 were coUcaed with an area detector, 
whereas the data for the crystals grown at pH 8 
were collected with a diflEractometer. 



Diffraction 
data 



Crystal form 



/»H6 



• /H 8 



Space group 
Cell dimensions 

(A) 



Crystal 4atu 

a 40.4 

b = 92.0 
c = 127.4 
2 



123 

a = 124.4 



Molecules per 
asymmetric unit 

JJiffrfutum data 

Resolution (A) 2.3 

Total observations 90.000 

Unique observations 22,000 

«.ymm* 0.05 

Refinrment remits 

R^ry^tf 0.16 

Resolution (A) 6,0-2.3 

rms difference 0.03 

(bond) (A)t 

mu difference 0.05 

(angle) (A)+ 



2.8 

5,000 

4,500 



0.21 

8.0-2.8 

0.03 

0.05 



•Agreement between symmctry-rebtcd structure-factor 
nugnitudcs R 

R " aa, t{f*) - f«t)/(2*f*) 

where ifk) b the mean structure factor magnitude of the t 
obscrvatium of rcflcctioiu diat arc related to the Bragg 
index h. tAgrccmcnt between the observed (/'obTj 
and calculated (Pcmk) structure factor magnitudes Rcryu 

R^u - i^iif^i - Ifob,!) 

tRoot-mcan-s<)uare deviation between the ideal and 
refined bond ducanccs and angle distances. 



His^ and the O7 atom of Scr^'* (Tabic 2 
and Fig. 3). In D 102 N trypsin, there arc 
two chemically distinct conformations pos- 
sible for Asn'*°. In one of these the N62 
group of Asn*^'^ would be oriented toward 
the main-chain amide groups of residues 56 
and 57. Since the asparagine ami do group 
cannot form a hydrogen bond with the 
main-chain amides in this orientation, they 
couJd approach no closer than the sum of 
their van dcr Waals radii (>3.4 A). 

The alternative conformation is related to 
the first by a rotation of 180" about the CP— 
C7 bond. In this case, the O&l atom of 
asparagine couJd accept hydrogen bonds 
from the main-chain amide groups, whereas 
the N&2 atom could accept hydrogen bonds 
from the His^' imidazole and Ser^ hydrox- 
y\ groups. The two conformations can be 
distinguished by the observed distances be- 
tween the main-chain amides of residues 56 
and 57 and the nearest atom of the Asn'°^ 
side chain. The interatomic distances in the 
present model (25, 16) support the assign- 
ment of the tautomeric form shown in Fig. 
3A. One of the Asn"*^ ami do atoms is 
located 2.6 A from the amide nitrogen of 
residue 56 and 3.1 A from the amide of 
residue 57. This atom of the Asn'°^ side 
chain could then be involved in hydrogen 
bonds with these two amides and would 
thus be identified as 051. Asn'**^ N52 
would therefore be a hydrogen bond donor 
to both die N81 of His^^ and the 05 of 
Scr^'^. Asp'°^ accepts hydrogen bonds from 
both of these residues in bovine trypsin. 

In the profK>scd crystallographic model, 
Asn'*^ can only serve as a hydrogen bond 
donor to His^^; the polarity of the hydrogen 
bond network involving His^, residue 102, 
and Ser''^ is reversed in the mutant enzyme 
with respect to that in bovine trypsin (Fig. 
3). For values of greater than the pK^ of 
the imidazole {K^ is the ionization con- 
stant), the monoprotonatcd tautomer must 
be protonated at Ne2 since it serves as a 



hydrogen bond acceptor from Asn at 
N&l. In contrast to trypsin, the N82 of 



tr*n* 



His^^ in the mutant enzyme is a potential 
hydrogen bond donor to the O7 of Scr**^. 
llius His^^ cannot act as a general base in 
transferring a proton from Ser*'' and this 
probably accounts for the diminished activi- 
ty of D 102 N trypsin near neutral pH. For 
trypsin above neutral ^H, where the enzyme 
becomes active. His is protonated at N81 
(77). Therefore, the presence of a negatively 
charged Asp***^ maintains the un protonated 
Ne2 with a lone pair of electrons as the 
general base catalyst for transfer of the pro- 
ton from the Oy of Scr**^ to the leaving 
group, 

A difference Fourier map (Fig. 2, top) for 
the crystals grown at />H 6 was computed 
with the histidinc omitted from the calculat- 
ed phases and structure factors, revealing 
two sites for the side chain (II, 22). In one 
of these, the Cp-Cv bond is trans to Ca-N, 
and the imidazole is rotated from the cata- 
lytic site. The trans His^ conformer docs 
not form a hydrogen bond with Asn'**' or 
Ser"^ but rather is in contact with a solvent 
water molecule at the surface of the enzyme 
(Tabic 2). The alternative position is nearly 
gauche and similar to the His^' conforma- 
tion in bovine trypsin and D 102 N trypsin 
crystallized at 8 (Fig. 2, bottom). 

Integration of the difference electron den- 
sity indicates that the occupancy ratio of the 
gauche to trans isomers is af^roximatcly 2 
to 1 (Fig. 2, top) (22, 22). A difference map 
computed with phases derived from all of 
the atoms in the refined model reveals resid- 
ual positive electron density in the vicinity of 
the Cel of His^ (gauche), and may corre- 
spond to a partially occupied solvent water 
which is present in the active site pocket 
when His is displaced (trans). 

The displacement of His^^ from the aaive 
site of D 102 N trypsin below neutral pH is 
probably a consequence of steric conflicts 
between the protonated N81 atom of His 
and the proton on the N82 of Asn. D 102 N 
trypsin, like its natural homolog, is crystal- 
lized only in the presence of the substrate 
analog benzamidine, and there arc no appar- 




60 



120 tlM -120 



M 



Fig. 4. A histogram show- 
ing the x' torsion angles of 
353 htsridir)C5 found in 53 
protein structures refined to 
greater than 2.0 A resolu- 
tion {llf 26). The x' angle 
of 92* zauchc observed in 
His^' of bovine trypsin is 
rare. Angle values are trimo- 
dally distributed about 
+60% 180". and -60". The 
trans conformer that occurs 
at pH 6 in D 102 N rat 
trypsin is more frequently 
observed. 



21 AUGUST 1987 



REPORTS 907 



Table 2. Conformational and stereochemical data for active site residues in bovine and D 102 N 
trypsins. Values for the two molecules in the asymmetric unit of D 102 N trypsin grown at pH 6 arc 
averaged. Distances are not given for the 2.8 A resolution crystals grown at pH 8. The wild-type 
coordinates arc from the bovine trypsin-bcnzamidinc crystal structure (7). 



Residue 



Atoms 



Conformational 
angles (degrees) 



Asn 



102 



Wild 
type 



Hydrogen bond 
distance (A) 



Asn 



t02 



Wild 
type 



His" (gauche) 
(trans) 

His^' (gauche) 
(trans) 

Scr'" 

His" (gauche) 

His*' (gauche) 

His" (gauche) 

Asn'"/A5p*« 

Asn"«/A5p**" 

Asn"*VAsp»^ 

Ser'" 



N-Ca-CM^T 
Ca-C^-Cy-N81 

N-Ca-Cp-Ov 
N51-Asn/Asp'" N/OS2 

Ne2-Scr*" Oyl 

NtZ-HjO^O 

0«l-Ala»*N 

081-His^^ 

N/0B2-Ser^'*0^ 

O^HjO^'** O 



84 
157 
-96 
-93 
-59 



92 
-100 
-77 



2.8 
3.2 
3.0 
2.6 
3.1 
2.7 
2.9 



2.7 
3.0 

2.9 
2.8 
2.8 
3.0 



cnt sicric conflicts between His^' and other 
residues in the catalytic site. However, even 
in irj'psin, the native gauche conformation 
of His^^ imidazole may be energetically un- 
fiivorcd and require hydrogen bond stabili- 
zation by Asp - A survey of the x* angles 
of His side chains in refined protein struc- 
tures (Fig. 4) shows that the conformation 
found in bovine trypsin is uncommon. Ste- 
ric hindrance arises as a result of dose 
contacts between the and C82 imidazole 
atoms and the main-chain carbonyl carbon 
[contaa distances of 3.0 A and 3.2 A, 
rcspcaivcly, are measured from the coordi- 
nates of bovine trypsin (7)]. Nevertheless, 
His" is well ordered in crystals of native 
trypsin (i3, IT) and criiium exchange mea- 
surements indicate that expulsion of His" 
from the active site pocket occurs in solution 
with a frequency of less than 1 in 50 over the 
^H range 1.5 to 9 {18). Displacement of 
His" from the gauche conformation in ser- 
ine protease crystals has so far been seen to 
occur as a result of stcric conflict in covalcnt 
intermediates formed with certain substrate 
analogs {19, 20) or as a result of the intro- 
duaion of heavy metals into the active site 
{21, 22). In native trypsin, the histidinc 
conformation is stabilized by a hydrogen 
bond between the N81 atom of His and the 
carboxylatc oxygen atom of Asp'**^. 

In D 102 N trypsin, the conformation of 
His^' appears to be linked to its protoruition 
state. In the monoprotonated imidazole tau- 
tomcr that predominates above neutral ^pH, 
the N81 atom of His can accept a hydrogen 
bond from N82 of Asn"*^, Protonation at 
the histidine NBl at the lower pH results in 
the loss of this hydrogen bond and possibly 
also steric conflict with the N&2 of Asn^*". 
The imidazole is then free to rotate to the 
more favored trans conformation, away 
from the catalytic site. Orthorhombic crys- 

po& 



cals of D 102 N trypsin arc grown near the 
pK^ of histidinc, and thus the statistically 
disordered histidinc side chain may reflect 
an equilibrium distribution of mono 
(gauche) and diprotonated (trans) forms of 
tfie His" imidazole. The variant D 102 N 
trypsin is able to react with the active site 
ritrant tosyl-L-Iysine chlorc»ncthyl ketone 
(TLCK) at 20 to 70% of the rate observed 
for trypsin firom pH 7.2 to 8.7 (4), which 
si^csts that as in the pH 8 crystals, a 
substanrial proportion of D 102 N trypsin 
molecules in solurion contain His^' in the 
native gauche conformation. 

As a result of the subsritution of Asn for 
Asp'*'*, the mutant trypsin reacts with diiso- 
propylfluorophosphate (DFP), a reagent 
that specifically titrates the Ser'** nuclco- 
phile, 10"* times more slowly than with 
trypsin {4). The decreased Scr^ nudeophi- 
lidty in D 102 N trypsin probably results 
from the lack of a base in the active site to 
accept the serine hydroxyl proton. His" 
does not act as a base in this mutant because 
it exists in the incorrect tautomcr. While the 
tautomeric form of His" is changed in D 
102 N trypsin, the oxyanion binding site 
(24) — the, main-chain amide groups of resi- 
dues 193 and 195 — is unaltered. The re- 
duced activity of the mutant thus gives an 
upper limit to the contribution of transition 
state binding alone to the reaction rate. 
Trypsin normally accelerates the rate of DFP 
hydrolysis by a factor of 10* {20). Our 
results suggest that a faaor of 10^ in rate 
enhancement may derive fi'om the stabiliza- 
tion and orientation of the lone pair on the 
Ne2 atom of His^^. The remaining factor of 
10^ can presumably be ascribed to orienta- 
tion of the nucleophilc (Scr'*') and transi- 
tion state binding. Under alkaline condi- 
tions (pH > 10), the rate of catalysis by the 
mutant approaches 10% of that of the native 



cnzynK (4) through an altered mechanism 
in which base catalysis appears to be provid- 
ed by solvent hydroxide. In trypsinogen, the 
situation is reversed; His" is correctly ori- 
ented, but the oxyanion binding site is not 
propcriy formed to stabilize the transition 
state (2i), even after irreversible binding of 
the transition state analog DFP (23). The 
reaction rate toward DFP is also reduced by 
a factor of -^10* relative to trypsin {20), 
which again ascribes an upper limit of 10^ 
rate acceleration to transition state binding. 
Catalytic rate enhancement by serine prote- 
ases is thus partitioned almost equally be- 
tween (i) orientation and stabilization of the 
enzyme base His^^ and (ii) the correctly 
oriented serine nudcophilc and transition 
state binding site. Studies of D 102 N 
trypsin indicate that the Asp'**^ residue plays 
a critical role in the first of these processes, 
perhaps electronically with His" (24), and 
structurally, by providing hydrogen bond 
stabilization of the funaional tautomer and 
thus maintaining its correct orientation 
within the catalytic site. 



R£FER£NCES AND NOTES 

1. G.Daon^S,Co,H.t^cuT9jh,Biodnm.Bifiphp.AaM 
19, 193 (1956); E. Shaw. M. Marra-Guia. W. 
Cohen, Biochtmistry 4, 2219 (1965). 

2. W. W. BadKwchin, BioO/tmistry 25. 7751 (1986). 

3. A. IL Fcnht and J. Sperling, /. Mti. Biol. 74. 137 
(1973); A. EL Fcxsht, Eraymf StmaMrt and Mctho' 
fiism (Freenun, New Yort, cd. 1, 1977). 

4. C S. Craik a/., Sd*»a 237, 909 (1987). 

5. N. H. Xuong, S. T. Freer. R. Hamlin. C Nieben. 
W. Vcmon.^oa OysuOU^. A34. 289 (1978); R. 
Hamlin rr «l.J. Appi. Oy^ttUf^r. 14, 85 (1981); A. 
J, Howard. C Nielsen, N. H. Xuong, Mftbods 
Enzymcl. 114. 452 (1985). 

6. W. A. Hendridcson and J. H. Konncxt, in Compia- 
ir^ m CfystalU^raphy, R. Diamond. S, Ramcscshan. 
K. Vcnkaicsan, Eids. (Indian Academy of Sciences, 
Bangalore, 1980), p. 13.01. The program FROLSQ 
written by Hcndrickson and Konncn was used for 
refinement. The modificarion added to PROLSQ by 
B. Finzcl was used to decrease computacion time. 

7. 1. L. Chambers and R. M. Stroud, ^cto Cryatdtt^. 
B33, 1824 (1977), ihid. B35. 1861 (1979); M. 
Kriegcr, U M. Kay. R. M. Stroud./. Md. Biol. 83, 
209 (1974). 

8. M. G. Rossmann. Ed., The MoUcyiar Rtpituemtnt 
Mtthod (Gocdon 8c Breach. New York. 1972); R. A. 
Crowdier, ibid.^ p. 173. 

9. The strucnire o( D 102 N rat trypsin at f H 6 was 
determined by molecular rcplacancni methods (5) 
by using the atomic coordinates of boviiK tryptsin- 
bcnzamidine (7) as a search set. The coordinates 
were modified by removal of all sidc-diain asoms for 
positions at which rat and bovine trypsiiu differ in 
amino acid sequence, as well as those of His'^ and 
Scr"^. Coordinates for solvent (bcnzamidine) arKl 
the botiivl calcium ion were cxduded. The crystals 
grown It pW 6 exhibit pseudotranslational symme- 
try such that the unit cell comprises a b axis repeal of 
two P2i22i subccUs related hy a translation of b^. 
As a consequence, reflections with b odd for 

3.0 A arc synemadcally weak or absent. 
The relative rotation of the search coordinates with 
rcspca to the rat trypsin unit cell vras determined by 
using the &st rotation function developed by 
Crowthcr {8). The corrca solution was found with 
data to 3.0 A resolution and an integration radius 
from 4.5 A to 16.0 A. The position of the rotated 
search model in the D 102 N trypsin unit cell was 
found by an /t-&ctor search (witfi a prc«ram ob- 
cuncd from E. Dodson and P. Evaru). vSUdi pvc 
an R factor of 0.43. The position was rd&ned by least 

SCIBNCE, VOL, 237 



\ 



squares with the computer program CORELS {10). 
Hnc pocitional parzmctcrs of individiuJ atoms were 
then refined suojcci to sicrcodiemicaj restraints by 
using the subcell data {6). The positions of missing 
side-diain atoms and chose of the bcmamidinc and 
calcium were dctcmrtincd from die subccU diScrcncc 
dcction density map computed from the refined 
model A mcxicl of the ftiU oystallogiaphic asym- 
metric unit in the correct P2t2t2t unit ccU was then 
constructed by adding a replicate of the trypsin 
molccuk translated by 46 A atong die * and 32 A 
along c. The full model was refined in chrrc stages. 
In each stage the model was refit to a difference 
Fourier map computed with the coefficients 
{2F^ ~ Fait). Strong peaks in the electron density 
in positions consistent with hydrogen bond contacts 
to the protein or other established sclent positions 
were inchided in the model as ordered solvent. Next, 
the positional and thermal paramcten of all atoms 
were refined by iterations of restrained crystallo- 
graphic least squares, with data in the resolution 
range 6 A s s 2.3 A. Refinement was stopped 
when further cycles failed to reduce the crystatlo- 
graphic R factor and when the rtican shift in coordi- 
luu positions was less than 0.05 A. Refined coordi- 
nates were then used to compute phases fen- a new 
electron map to be used in the next stage of manual 
refitting. After the third stage {R factor =• 0.18), 
examination of the electron density failed to reveal 
error* or ambiguity ui main- or side-chain positions, 
ahhoi^ die side chains of six residues located at the 
surface of the molecules were disordered and could 
not be defined. Up to this point, lidr-chain atoms 
for His*', Asn"", or Scr'*^ had been excluded from 
the n>odel. A difference elcaron density map 
(fob* - fe»ic) revealed strong and well-ordered den- 
sity for the Asn'" and Ser'* but the His" residue 
appeared to be statisticaUy disordered (Fig. 2, top) 
(/;>. 

10. J. L. Sussman, S. R. Holbitwk, G. M. Church, S. H. 
Klm^Aaa CfyjtaiU^r. A32. 311 (1976). 

11. The possibiliiy that one or other of the peaks arc 
artifiurtual tested by independent refinement of 
two alternative mcxlels: one with His^' fit to the 
stronger, internal dcnsiry and the second with His^' 
fit to the external density. In each model the His'' 
atoms were assigned full occ^iancy and side- chain 
posidons for Asn'*" and Ser'**^wcrc included. Each 
model was subjected to restrained crystallographic 
refinement by varying the thermal and positional 
parameters of all atoms. Subsequently, a dificroKc 
Fourier map {Fokn ~ ^ cBte) computed for each 
mcxld with the use of the refined positional and 
thernul paraitKtcrs for all of the atonu in the 
respective models. In both cases, residual electron 
density appeared at the alternative histtdiiK site. 
Again, the observed density peaks were contiguous 
with the C3 atom of His^ and thus couJd not be 
interpreted as ordered water molecules. The relative 
occupancy of the two histidinc positions and the 
total occupancy of both positions relative to other 
hisridii>e side chains was estimated by tntcgradon of 
difference electron dcruity at ail of the hiscidirTC side- 
chain positions in one of the trypsin molecules in the 
asymmetric unit. The difference Fourier map 
{ftAa. — fc^) used in the integration was computed 
from a model in which the side-chain atoms of all 
four histidine residues (at sequence positioru 40, 57, 
70, and 87) were removed frcmt the coordinate set 
of one molecule. Integration was performed manual- 
ly by summing over all grid points within 2.0 A of 
histidine atomic pc»itions that had electron density 
at least one standard deviation greater than the 
t»ckground density. After normalization the appar- 
ent relative integrated difference dcruitics at the 
histidine side-chain positions were: His^, 0.87; 
Ht$^\ 0.60; His™, 0.79; and His", 1,0. AU but 
His^^ arc well ordered, so the range in integrated 
densities reflects thermal motion and experimental 
error. The sum of the density over the two His*' 
side-chain sites Is lower than the mean density of the 
wcU-ordcrcd histidine side chains, but u consistent 
with the high B factors uf His'^^ atoms at both 
positions. The relative occupancy of the alternative 
His'' positions was estimated by integrating the 
difference density at the N61 and CeI atoms of the 
gauche conformer and the C8>2 and Ne2 atoms of 
the traiu conformer and by taking the ratio of the 



integrated densities for the two positions. The re- 
maining histidinc atoms were not inchxled in the 
tnicgracion because the resolution of the data set did 
not allow the densities of the two confbrmcrs to be 
resolved at those positions. 

Final refined positional and thcrnial parameters 
for both tnuu and gauche conformcn were deter- 
mined by refining an atomic model in which both 
ccMifonncrs were simultaneously iiKluded. Side- 
chain atoms of the gaudie conformer v^xre assigned 
occupancies of 0.67 and atoms of the trans isomer 
were assigrted occupancies of 0.33 based on dK 
estimate derived from the integration described 
above {12). Afrcr three final cycles of rcfincinent of 
all thermal and positional parameters of both trypsin 
numomers in the asymmetric unit, the crystaUo- 
gr^hic R factor was 0.161. 

12. A modified version of PRCyTIN (obtained from J. 
Smith) does not generate rcstrainu between alter- 
nate side-chain positions of a statistically disofdercd 
rcsidiu. This allows refinem en t of two confbrma- 
tioru of an amino acid simultaneously. 

13. W. Bode and P. Schwager, /. Md. Bid. 98. 693 
(1975). 

14. R. Henderson, iMdL 54. 341 (1970). 

15. An upper estimate of the mean error in atomic 
position is 0.25 A. It was obtained by an analysis of 
the variation of crystallographic R factor as a func- 
tion of resolution {16). 

16. V. Luz3Cti,^cM CrystaiUffr. 6. 142 (1953). 

17. A. A. Kossiokoff'and S. A. Spencer. Bwbemistry 20. 



S£RINB PROTEASES FUNCTION IN 
many biological systems to hydrolyzc 
specific polypeptide bonds. Trypsin, a 
wcll-studicd member of this family, cata- 
lyzes the hydrolysis of peptide and ester 
substrates that contain lysyl or arginyl side 
chains* Serine proteases have the triad of 
residues Asp' , His^, and Scr"^ at the 
active site (chymotrypsin numbering sys- 
tem). X-ray crystaliographic studies reveal 
that these three residues arc in close proxim- 
ity, which suggests they may serve as a 
fiincrional interacting unit responsible for 
bond formation and cleavage during cataly- 
sis (i). Numerous chemical and physical 



6462 (1981). 

18. M. Kncger a al., ibid. 15. 3458 (1976). 

19. M. N. G. James, A. R. Sielecki, G. D. Bmycr, L. T. 
nelbaere. C A. Bauer./. Mol. Btol, 144, 43 (1980). 

20. P. H. Morgan rr a/., Prvc Nad. Acad. Sd. U.SjI. 69. 
3312 (1972). 

21. A. A. Kossiakoffrr a/., Suc&rmiro^ 16,654(1977); 
H. Fehlhanuner. W. Bode. IL Huber./. MoL Bid. 
111.415 (1977). 

22. }. L. Chambers ct a/.. Siodftm. Biifhys. Ra. Com- 
mun. 59, 70 (1974). 

23. M. O. Jones and R. M. Stroud, Biodfa ms oy, in press. 

24. D. M. Blow tt ai., Natun (LoTtdon) 221. 337 
(1969). 

25. C 5. Craik a al.J. Bid. Chem. 259, 14255 (1984). 

26. The ccx>rdinatcs were obtained from the Prottin 
Data Bank at Brookhaven National Laboratory. 

27. Wc thank ]. Sadowsky, C. Neilscn, and £. Gold- 
smith for assistance with Area Detector data collec- 
tion and processing arul B. Montfort for assistance 
with crystallc^raphic refinement calculations. We 
grateful^ acknowledge grant support from NIH: 
AM31507 to S.R.S., GM244B5 to RJ4.S., and 
AM2608I to R.).F.; from NSF: DMB8608086 co 
CS.C and PCM830610 to W.J.R.; a Bristol Meyer 
grant of Research Corporation and a CCRC grant 
to C.S.C The coordinates of the D 102 N trypsin 
structure at 6 have been submitted to the Protein 
Dau Bank at Brookhaven Natiorul Laboratory. 

29 September 1986; accepted 39 May 1987 



Studies indicate that Scz^ and His^ play 
crucial roles in catalysis. For example, selec- 
tive reaction of Ser*^^ with diisopropylfluor- 



C. S. Oaik, Departments of Pharmaceutical Chemistry 
and of Biochemistry and Biophysics, University of Cali- 
fornia. San Francisco. San Frandsco, CA 94143-0446. 
S. Roczruak, C. Largman. W. J. Rurter, HormorK: 
Research Institute and Demrtment of Biochemistry and 
Biophysics, University of California, San Francisco, San 
Francisco, CA 94143-0448. 



♦Present address: NutraSwccr Company, Mount Pros- 
pect, IL 60056. 

T Present address; Veterans Admin istrauon Hospital, 
Martinez, CA 94553, and Departments of Internal 
Medicine and Bintoeical Chemistry, University of Cali- 
fornia, Davis. CA 95i616. 



The Catalytic Role of the Active Site Aspartic Acid in 
Serine Proteases 

Charles S. Ckaik, Steven Rocznlak,* Corey Largman,! 
William J. Rutter 



The role of the aspartic acid residue in the serine protease catalytic triad Asp, His, and 
Scr has been tested by replacing Asp'**^ of trypsin with Asn by site-directed mutagene- 
sis. Itic naturally occurring and mutant enzymes were produced in a hcteiolog<ms 
expression system, purified to homogeneity, and characterized. At neutral pH the 
mutant enzyme activity with an ester substrate and with the Scr'*'-spccific reagent 
diisopropylfluorophosphate is approximately 10^ dmcs less than that of the unmodi- 
fied enzyme. In contrast to the di^madc loss in reactivity of Scr^'^, the mutant trypsin 
reacts with the His"-spccific reagent, tosyl-L-lysine dUoromethylkctonc, only five 
times less efficiently than the unmodified enzyme. Thus, the ability of His'^ to react 
with this affinity label is not severely compromised. The catalytic activity of the mutant 
enzyme increases with increasing so that at 10.2 the k^t is 6 percent that of 
trypsin. Kinetic analysis of this novel activity suggests this is due in part to participa- 
tion of either a titratable base or of hydroxide ion in the catalytic mechanism. By 
demonstrating the importance of* the aspartate residue Ln catalysis, especiaUy at 
physiological pH, these experiments provide a rationalization for the evolutionary 
conservation of the catalytic triad. 



21 AUGUST 1987 



REPORTS 909 





Exhibit 36 



4 



J, Biochem, 124, 784-789 (1998)^ 



A Novel Low-Density Liipoprotein Receptor-Related Protein with Type r 



II Membrane Protein-Like Structure Is Abundant in Heart^ 

Yasuhiro Tomita, Dong- Ho Kim, Kenta Magoori, Takahiro Fujino, and 
Tokuo T. Yamamoto' 

Tohoku University Gene Research Center, Sendai 981-8555 
Received for publication. May 21, 1998 

We report herein the identification of a novel member of the low-density lipoprotein 
receptor (LDLR) family termed LDLR-related protein 4 (LRP4), Murine LRP4 cDNA 
encodes a 1113-amino-acid type II membrane-like protein with eight ligand-binding 
repeats in two clusters. Southern blot analysis of genomic DNA from several different 
organisms suggests the presence of LRP4 homologues in chicken lacking the gene encoding 
apolipoprotein E, which is recognized by the ligand-binding repeats of LDLR. LRP4 
transcripts were detected almost exclusively in heart in mouse and humans. Despite the 
presence of the ligand-binding repeats, COS cells transfected with LRP4 did not show 
surface-binding of ^-migrating very-low-density lipoprotein, suggesting that LRP4 plays 
a role in a pathway other than lipoprotein metabolism. 

Key words: LDL receptor family, LDL receptor related protein, membrane protein, 
receptor. 



m 



The low;density lipoprotein receptor (LDLR) family is a 
growing' super gene family that includes LDLR itself (2), 
apolipoprotein E (apoE) receptor 2 (apoER2) {2, 3), very- 
low-density lipoprotein receptor (VLDLR) (4, 5), insect 
vitellogenin receptors (6, 7), LDLR-related protein/ ara- 
macroglobulin receptor (LRPl) (S), a kidney autoantigen 
gp330/megalin (LRP2) (9, 10), and a recently identified 
member termed LDLR relative with 11 binding repeats 
(LRll/sorLAl) (11, 12). All members of this gene family 
contain the following five structural motifs: (i) comple- 
ment-type cysteine -rich repeats, termed LDLR ligand- 
binding repeats or LDLR class A repeats; (ii) cysteine- rich 
epidermal growth factor (EGF) precursor- type repeats, 
termed growth factor repeats or LDLR class B repeats; (iii) 
cysteine- poor spacer regions, with five copies of the se- 
quence YWTD, separating the growth-factor repeats; (iv) a 
single membrane- spanning region; and (v). a cytoplasmic 
region with at least one copy of the "NPXY" internalization 
signal. LDLR is the best characterized protein in this 
superfamily and the relationship between structure and 
function for each module of LDLR has been elucidated by 
analysis of mutations in patients with familial hypercholes- 
terolemia {13, 14). 

* This work was supported by the Japan Society for the Promotion of 
Science Grant RFTP97L00803. Sequence data from this article have 
been deposited with the EMBL/GeneBank Data Libraries under 
accession No. AB013874. 

'To whom correspondence should be addressed: Fax: -I-81-22-263- 
9295, E-mail: yama@ biochem. tohoku. acjp 

Abbreviations: apoE, apolipoprotein E: apo£R2, apolipwprotein. E 
receptor 2; LDLR, low-density lipoprotein receptor; LRP. low-den- 
sity lipoprotein receptor- related protein; VLDLR, very. low- density 
lipoprotein receptor; >?*VLDL, >9- migrating very- low -density lipo- 
protein. 

(O 1998 by The Japanese Biochemical Society. 



Among members of the LDLR family, VLDLR and 
apoER2 most closely resemble LDLR in structure and, like 
LDLR, bind apoE-rich /J-VLDL with high affinity {2-4). In 
the chicken, VLDLR is expressed almost exclusively in 
oocytes and mediates uptake of yolk precursors, VLDL and 
vitellogenin (25). This receptor- mediated process is criti- 
cal in non- mammalian vertebrate oogenesis: female chick- 
en mutants lacking VLDLR are sterile {16). In contrast to 
the chicken, mammalian VLDLR mRNA is abundant in 
heart, skeletal muscle, brain, and adipose tissues (4). 
Frykman et al. have shown that mice lacking VLDLR 
exhibit modest decreases in body weight, body mass index, 
and adipose tissue mass, while their plasma cholestei'ol 
levels, triacylglycerol levels, and lipoprotein profiles are 
not altered {17), Furthexmore, knockout mice lacking both 
VLDLR and LDLR exhibit a modest hypercholesterolemia 
{17), whereas apoE knockout mice exhibit a profound 
hypercholesterolemia {1$). These data suggest the pres- 
ence of other apoE receptors. 

To extend our studies on receptors that may play a role 
in the clearance of apoE- containing lipoproteins from the 
circulation, we have been characterizing cDNAs belonging 
to the LDLR superfamily. In the previous study, we have 
characterized a new LDLR-related protein termed LRP3 
{19). Human and rat LRP3 consist of a 770-amino-acid 
type I membrane protein with the following regions: a 
putative signal sequence; two isoleucine/leucine/ valine- 
rich regions with an RGD sequence; two ligand-binding 
repeat regions; a putative transmembrane region; and a 
proline -rich cytoplasmic region with a tyrosine -based 
internalization signal. Despite the presence of the ligand- . 
binding repeats, CHO cells transfected with LRP3 failed to 
bind >?-VLDL. 

In this study, we have isolated a near full-length cDNA 
encoding a new member of the LDLR family, termed 



784 J. Biochenu 



si- 





^-Density Lipoprotein Receptor- Related Protein 4 




and 
like 
:.In 



ID 



and 



£ 

riti- 
ick- 

:t to 

t in 
(4). 
)LR 
lex, 
:rol 
are 
toth 
mia 
and 
-es- 



the -1:1 
ring 
ave 
IP3 
icid 
;: a 



It J" 



ne- 
ling 
d a 
sed 
nd- 
11 



- 1 • 0 CCCCTCAa rcC IOCmACTTC*CACTC*AUl 11 I I 1 1 C I ecu 1 UTCC C I ICC I LCI I IU ^TCCCCTCCCACCTCCTTCCCCTCC(rrACCTCAOTG»CCACA£U^CT^ 

rrCCTCTTCT 



-60 CTI 

1 



reCCTCACTCCTO ia*CTTOXCrCCG»C»CCCTGCC»CTCATtXCC*«KTTTCeTT^ 

HcnvsrsvttvssvRRAii 



61 CCT GCaca fcTgTAU : I C JU. UfCA CACT CCC I C CA ACCACCCCCCTOCCTCCJtCTCJU^C Gtt lCT T GCC T GC C C CCCSCTTCC<aHgSG»C» ^ ^ 

21 PGft<C>r l.S©av» PTTALRJitH. GLO0ACVPCtTACCAVCPG P 



lei rrccocACOCCi 



:CGCTCCAACTTCCACCCTCCCCCCACCTGCAACCA1 



^TTgC T T T CC A0 CCCCC CC1 

SI LCTRC rtSCSX rOA PCSWRO^rCAPPAP OV LR'AORS VG CC 



)0l TCTCCTCAgAAC C T CC TCACTGCTJ 
101 (C) POXLVTAN 



rcCTTTCTCCGAACATTAAAA 
]s r V C T L K 



* 2 1 ACCCrrr ATTTCAAATCAAATCACACTGMCCTTTGCTCACTCATCCCGAAGCTCCACTCCCTGCTCTTATTCCTC^ 

m R V. T r K 8 H D S CPLVTDGCARVPCVI PVHTVYVCHTCAPSIP 

»« 1 CCCAGCCACTCCACTCCACCCTWCAC C GAGACCTCCTTCTCCASACCfcCCAQiCTCA^^ 

101 PSOSTPAVTPRAPSPCDQSHR W T S T © H MIT HSC©OI LPYH 

C6t ACCAeGTTCGCACCTCTCTTCCCAATTGTCAAAAACATCGACATCCAGAACTTCCTCAACTTCTTCACCTACCTCCATCCCCTC^ 

221 STLAPLL»rVKNHOMCKrLKrrTTLHRl.S©YOH 1 LLPC(£>S 

781 CTCOCCT TCCCTGACTSCCTTCTTCATCCCCATGA C ACCCATCCTCT TCTACCCTCTACAT C T T T Jt TCACCCTCCAAAASAAGCATCCGAATC TCTCCTCCCIVATCCTCAACTCCTCC 

261 LA rp e(C>VVDCODRHGL L P©RS r©tAAKCC©C5VLCHVNS S 

POl TGCCCCGATTCCrTCAGArSCTCTCACTTTACCGACCACArrGAGACTAACACCACTCTC^^ 

301 MP DS Lft©SOrRDHTCTNS5VRKS©rSLOOCHCKOS L © C C C 



1021 GAaACCTTCCTCTCTACCACCGCCCTCTCCGTCCCCAACAACCTGCACT^TAACGCCTATAATCACT^ 
341 C S 



L © T 



L © 



LO©HGYNO©D 



JU»;aACtLAGCCC(:JiTTgaU^CTSp^CCAACCA 
I S D el AH©H©SKOLrH 



1141 TgTCCCACACCCAACT^CTCCACTACACCCTCTTCnrCTGATGCCTACGATCACTCTCCGGACCC GACTCACGJ^^ 
391 ©C T G K©LHY S L L©OG T D 0©C 0 P | 5 D C j ON©D©|i L T K C H R©C D 



1261 CCCCCCTgCATTCCCCCTGACTCCCTCTgCCATCGGCACCATGACTGTCTGOACAA STCTCATCACG TCA 

AEIIV<c)dGDHO©VOK [ S D t 1 VN©9©HSOCL 



421 C n ©^I A 



CTGSTCCAATJ^ptCAAGTGCACAGTJU; 
VE©TaCO© 



1381 ATCCCTACCACCTTCCACTgraATCGCCACCAAaACTCTAACCATCGCAGTGJ^ 
461 tPSTrQ©DCDCD©KDG t S O E I CH©S DSQT P©PCCCOG©rc S 5 

1501 TCCGTCCAATCCTCTCCTCCTACCTCTCTCTgTCaCTCACA CACC^CCCTG ^ 
501 ©V e s(c)aGS5 L©DS D \ S S L ] S H©S0©E P I TLE L©MNLLYN H T H 

1 62 1 TATCCAAATTACCTTOCCCACACAACTCAAAATCAAC C CTCCATCACCTtaaaun^TCCCTirTCCCTCCCCTTCTA^ 
541 YPHTLCKRTQKCAS I SHESS trPALVOTW©YKV LMPPA©! 

1 74 1 ATTTTCCTTCCAAACTGTt^TCTGAATACAGCACAACCCATCCCCCCTTCCACACTCCTCTCTGACCACTCCAAACACCCCTCTCACTCTCTTC^ 
581 I LVPK^DVNTCOn I P P©RLL©EHSKeR©C 3Vt3 I VCIQM P 

1661 GAACACACCCACTGCAATCAATTTCCAaACGAAACTTCAGACAATCAAACrTCCCTCCTCCCCAAT(^^ 
62 1 E D T O©H0 FPE E S S DNOT©L L.PME OVE C©S P 5 H rK(c)R S C R© 

1 98 1 CTTCTCCGCTCCACGACATCTCACCGCCACCCTCACTCTGACGACCJtf yrrGACC^^ 
661 VLGSnR©O CQAD©OOD | 3 D C | E N©G©K C RA. L W C©P FN KO(£)l K 

2101 CATACATTAATCTCCCATGGCTTTCCACATTCTCCACACACTAT Ct^ 
701 HTLt©DGrP0©PO S | M D C 1 K H©S F©ODN E L E©AKH E©V p R O L 

222 1 TCGTGCGACCGATGGGTCCACTGCTCACACAC TTCTGATCAA TCCGGCTCTCTGACCCTCTCTAAAAATCCGA^ 
741 »*©DG W V 0©S D 5 j S D e") HC©VTLSKNG H 3 8 SLLTVKKSAKCKH 

2341 GrCTCTCCTGACGCCTCCCCCCAaACCTTGAGTCAGCTMKCTgOUWCACA 
701 V<c)ADCI«RETlS0LA©K0MGLCePSV7KL I PGOEGQOMLR L 

24 61 TACCCCAACTCCCACAATrTCAATCCGACCACCTTGCACGACCTGCTCGTATACACCCACTCCTCCCCAACCAGAASTGAGATTTCCCTTCTGTGtTCCAAGC^^ 
S21 YPMWEKL H C S TLQELLVYRHS©PSRSEI 5LL©Sk'oO©GRP 

2 591 CCTCCTCCCCGAATGAACAAGACGArCCTTGGCCCTCCGACTACTCCTCCTCCGACCTCCCCCTCGCAGTgCTCTCTGCAOACTC 
961 PAARMNKR I LCCRT5RPCRWPW<}©5 1?5 C PSCH I©0©VL t 

2701 CCCAACAACTtXWTCCTGACACTTCCCCATTCCTTTCAACCGAGAGAACACCCTGATCTTTCCAAACTCGTArTTGSCATAAACA^ 
901 A KKHVL TVAH©rECRE OAOVWKVvrG 1 NH L DHPSC TMOT R 

2 B 2 1 TTTCTCAACACCATCCTCCTACATCCCCCrrACACTCCACCAGTCGTAGACTATGATATCACCCTCGTCGACCTGAGCGATGATATCAATCAGAC*^ 
041 rVKTI LLNPRYSRAVVDYOISVVCLSDDI NET 5YVRpV©L 

294 1 CCCACTCCGGACGACTATCTAG*ACCAGATACCTAC7^TACArCACACGCTGCGG C CACATXXaXJU^TA^ 
991 PSPEeyLCP0TY©YITCWCHHCNKM7rKL0eGCVRI IPLE 

3061 CAOTGCCACTCCTATTTTCACATGAArUCCATCACCAATCCGATCATCISTGCTGCCTATCyUJT 

1021 0(C)0S Y rOMKT 1 TMRM I©AC Y E 5CTVDS©MG DS3G P LV©C R 

3 1 9 > CCCCCACGACACTCGACATTATTTGCTTTAACTTCATCCCCCTCCCTC'WTTrrCCAAACTTCTCGCACCTCGACTGTACACOWTCT^ 

1061 PCCQHT l.rGLT8WCSV©PSKVLGPCVYS MVS YTVGW I CRQ 

3301 ATATATATCCACACCTrrCTCCAAAACAAATCCCAACGATAATCAGACACTTTCTCGCCAAACCTACATCSAGAATGACCCTCT^^ gTCCTCCCAAGACCTCTACGAAC 
1101 lYtQirLOKXSOC* 

3421 ACGCCTTTCACCGACAeCACCCTCAACATCCACCCCAACATCTCTCCTCTTTCTOCTA£»TGAGrrr 

354 1 TTTAAAACCACACACCAAACTACCTTTTGTTATTTTCCTACCCTAACCTTGAATtrrACTCTtKAATTACa^ 

36 i 1 TrrTATTACTACTACAACACACKCACCCACATACACCCTGACTGATCTCCAGTTTCTGCrTAACCCCACTCGCTTJ^ 

3791 CTAa^AACCCAAaUSAATATATATGCTTTTATTATTTACTCTACTCTTCTAAATAACTTGAAGAAATXJtTG^^ 

390 1 ACAATGTAAAArrCTCTACCCAACCAAACTAACACTCTGAACTAACTACAATTCTATCCTTTCYGTATTCAAATTAAGCTTAAAATCTCCJ^^ 

4021 CCCACTATCTCAerrTAflATaACTCTGATCTCAAAAC CC AC C TCAATCCTTGAGCAAATAATTTCTTTCCTTATGTCGGAATGAATAAC^ 

4141 AAACCACAAAAAAAATTAAATAACATTCCACACCCAATTAATTCTGAAAATTAC I C T OC I H# TATTCACCCAAAACACAAAACTTACACAAAYATATTTCAAACTCCACCAAAATCTTCC 

4261 ATCGAGTATATAACATrrTCCAATTTCCCCCTCATCATCTCTAACATCCGCTATTCCCAriTGCCTCAr^ 

4 3 B 1 TCCCMTCAATTCCCAAACCAArr ACTCGTTACAACYATTTTTTCCCACTAAAAACTTTCAAAACACAAAC^^ TACCCASACAYGAACTATtrrAACATCCAAATC 

4 50 1 CCTTTTTGAACAACTACCArCCACTG t TAAACTTCRCCACCAACCAAACTGCCTCACTATTCCTTACACCGACTACCTGCAATTTTATATCTGTArrr^ I I iL TACATACTT 

4 62 1 caaatccaaaacattctttcaacccctattctccatcttcttcacctcttctcctcgaatttcttaca;^^ 

♦ 7 4 1 GACCATCGCCTCC Ul M l Ul lATAATTCTT GOCA CATAATTAATAAAATATTTTTACCATTCCC I Al n 



785 



led 



if.— 



Fig. 1 



^ol. 124, No. 4, 1998 



786 



Fig. 1. Nucleotide and deduced amino acid sequence of murine 
LRP4 cDNA. Nucleotide and amino acid residues are numbered on 
the left. Nucleotide 1 is the A of the initiator AUG oodon. Negative 
numbers refer to the 6' -untranslated region. Two in- frame translation 
termination codons at —87 and 3342 are indicated by asterisks. The 
putative transmembrane region is boxed in black. Cysteine residues 
are circled and the ligand 'binding motif SDE and similar sequences 
are boxed. Potential N- linked glycosylation sites are underlined and 
a potential polyadenylation signal is doubly underlined. 



LDLR-related protein 4 (LRP4) and describe here the 
molecular characterization of this new receptor- like pro- 
tein. 

MATERIALS AND METHODS 

Standard Procedures — Standard molecular biology tech- 
niques were carried out essentially as described by Sam- 
brook et al (20). Nucleotide sequencing was performed by 
the dideoxy-chain termination method (21) using M13 
primers, T3 and T7, or specific internal primers. Sequence 
reactions were carried out using Taq DNA polymerase with 
fluorescently labeled nucleotides on an Applied Biosystems 
Model 373A DNA sequencer. To analyze RN A in murine 
and human tissues, commercially available Northern blots 
(Clontech) were used for Northern blot analysis. 

cDNA Cloning — A murine heart cDNA library was 
constructed in pBluescript vector using poly(A) RNA and 
the cDNA synthesis kit from Pharmacia. The library was 
screened with a mixture of degenerative oligonucleotides 
corresponding to a highly conserved amino acid sequence, 
WHCDGD, among the ligEind-binding domains of LDLR, 
VLDLR, andapoER2: 5'-TGG(A/C)G(A/C/G/T)TG(C/T). 
GA(C/T)GG(A/C/G/T)GA-3'. Positive clones hybridizing 



Y. Tomita et al, - 






with the oligonucleotide probe were the reprobed with 
LDLR and VLDLR probes to eliminate cDNAs for these 
receptors. By screening 5 X 10' clones, we obtained one 
positive clone that hybridized with the oligonucleotide 
probe alone. 

'Zoo" Southern Blot Analysis — Genomic DNAs (10 ^g) 
prepared from a normal man, a male BALB/c mouse, a 
white Leghorn hen, and a female Xenopus laevis were 
digested with a large excess of EcoBl for electrophoresis in 
a 0.8% agarose gel, then transferred onto a nylon mem- 
brane. The membrane was hybridized with the entire 
region of murine LRP4 cDNA. Hybridization was at 42*C in 
5XSSC. SxDenhardt's solution, 200^g/ml denatured 
salmon sperm DNA, 50% (v/v) formamide, and 1% (w/v) 
SDS. The blot was then washed twice with 0.3 x SSC and 
1% (w/v) SDS at 60*C, followed by autoradiography. 

Expression of LRP4 cDNA in COS-? Cells—To con- 
struct an LRP4 expression plasmid (pLRP4-SRar), the 
entire coding region of murine LRP4 cDNA was inserted 
into an expression vector (pcDL-SR<3r296) (22) by multiple 
ligations of restriction fragments. The expression plasmid 
was transfected into COS -7 cells according to the trans fee- 
tion protocol described by Chen and Okayama (23). 

Lipoprotein Binding Assay — Rabbit >ff-VLDL {d 1.006 
g/ml) was prepared from the plasma of 1% cholesterol- fed 
animals (24), *"I-labeled >5-VLDL was prepared (25) and 
its binding by the transfected cells was assayed according to 
the procedure described previously (2). 

RESULTS 

Isolation and Characterization of Murine LRP4 cDNA — 
A near full-length cDNA encoding a new member of the 
LDLR family, designated LDLR-related protein 4 (LRP4), 



B 



5.00 ( ~ 



0.00 



•S-OO 




333 
373 
409 
4 4 6 
46 » 
6 4 4* 
68 3 



K O S lOg G G C S r iJcl- T sQlic^ P K t^DoQ)^^ 

nC-SKOLrHC-CTCKCjirHYSyLCDC Y 

CCMLTKEHRC-GCCBCIAA&HVCDCD 

SCHSOCLVE:C-T£|c)Gqi PSTraCDGD 

SOTPC-PECEOGC - FGSSC^S cQc S 

DVEEC-SPSHTKC- R C^LCS RR C DC O 

GCKERALHCCPrNKG C^KHTfDl C DC F 

- S fjcj- QDHE LEjcj- ANHqCvjPRqijw tc Pq w 



223 



445 



667 



689 



1113 



72 1 
CONS CHS US 

LAP4 -jcj - 

LDu^ --T-|c| Er-|g-- 



DP 




N DCD 

c Dcc aviso 

k]dc V CK SDC 
CiqcfGSOE. . 

qos Ds lST g'Nc 

SDEW^C 



Repeac 
1 
2 
3 
4 
b 
6 
7 

a 



II ---w- 



L«P4 C 



□ 


U0and binding repeal ( A repeal ) 




Transmembrane region 


LRP3 




EGF precursor homology region ( B repeal ) 


* 


NPXY or NPXY- like saQuence 


LDLR 




EOF precursor hornology region ( YWTO repeat ) 




Non • repeinive aequerce 






O • linked sugar region 






VU^LR 



mr 



a 



N 



N 



5 



ApoERZ 



LflPi N fTTgg^ ^^^^-B^^^ ^ INN I FR gl ^ g TT 

LRP2 H i H I I I k:g4 ^....^^.. ^ I I H I M a a 



iiitiiiiia?f— ^iMiiiiiiiw^K^ 




Fig. 2. Functional regions in LRP4. (A) Hydropathy plot analysis 
of the murine LRP4 protein. The numbers on the z-axis correspond to 
the positions of the amino acid residues in the protein. The putative 
tronamembrane (TM) region is shown by a thick line. (B) Comparison 
of the amino acids in the eight Ligand -binding repeats of murine LRP4 . 
Amino add alignment was optimized and gaps were introduced to 



match the six cysteine residues in each repeat. Amino acid residues 
conserved in more than 50% of the repeats are boxed and sho\^ii below 
as a consensus sequence. The consensus sequence of the ligand-binding 
repeats of human LDLR (i) is also represented. (C) Schematic 
representation of LRPsl-4. apoER2. LDLR. and VLDLR. 



/ - 

r- 



Best ^Mailable Copy 






a* 
le 
;d 
le 
id 
c- 

)6 
id 
id 
to 



\u)'P^^^^ lipoprotein Receptor- Related Protein 4 

isolated from a murine heart cDNA library by using a 
''fixture of degenerative oligonucleotides corresponding to 
;tbe higWy conserved amino acid sequence WRCDGD 
^^ong the ligand- binding domains of LDLR, VX.DLR, and 
i^7^poER2. Figure 1 shows the nucleotide and deduced amino 
"^Wds sequences of the cDNA, which has an open reading 
%frexx^^ of 3,339 bp corresponding of 1,113 amino acids with 
§j* calculated molecular mass of approximately 123 kDa. 
^^ff^e putative initial methionine was preceded by an in- 
^^^^me ' oi-mination codon present 87 nucleotides upstream. 
=HrA hydropathy plot [26) of the deduced amino acid 



-f^quence of murine LRP4 shows the presence of a hydro- 
S^lphobic region at amino acid residues 113-133 (boxed in 
black in Fig. 1 and identified with thick lines in Fig. 2A). 
^iXhis hydrophobic sequence of 21 amino acids strongly 
resembles the transmembrane region of membrane pro- 
l^'.teins, being flanked by a positively charged amino acid 
.^i(arginine) on the N- terminal side. This structural feature 
suggests that LRP4 has a type II transmembrane protein 
struct i:vc (amino terminus in the cytosol). 
The terminal side of the putative transmembrane 
"t^ domain contains two clusters of cysteine- rich repeats that 
~r resemble the ligand binding repeats (class A motifs) of 
J] " LDLR: one cluster contains three repeats and the other has 
J five (Fig. 2, B and C). Each repeat has six completely 
T,. conserved cysteines and a highly conserved C-terminal 
SDE tripeptide, which forms a part of the ligand- binding 
\r.8ite of LDLR (Fig. 2B). Unlike LDLR, VLDLR. apoER2. 
.p/LRPl, and LRP2. there are neither YWTD repeats nor 
growt^^ *'actor repeats (class B motifs) in the murine LRP4 
T= eequc . (Fig. 2C). 

-v- The cytoplasmic domains of LDLR, VLDLR, apoER2, 
' - . LRPl, and LRP2 contain one or two copies of a highly 
— conserved coated pit signal, FXNPXY (23). In the putative 
cytoplasmic region (N-terminus), we found neither a 
; typical FXNPXY sequence nor a similar tyrosine- based 
sequence (27). Further studies are required to determine 
whether LRP4 may function as an endocytic receptor. 
7 Southern Blot Analysis of the LRP4 Genes in Various 
Spec/*?.?— To test the possibility that LRP4 homologue 
[ genes -ht also be present in nonmammalian vertebrates 
* (known CO lack the apoE gene), Southern blot analysis of 
genomic DNA from several different organisms was carried 
' y-.out. This *'zoo blot** (containing DNAs of humans, mouse, 
•^ chicken, and frog) was hybridized with the entire coding 
region of the murine cDNA under relatively stringent 
J^^nditions (see "MATERIALS AND METHODS"). As shown in 
rf>5:;Fig, 3^ intense hybridization signals are present in mouse. 



•4 



B 



s 

1 





787 



and fainter but significant signals can also be detected in 
human and chicken DNAs. These data suggest the presence 
of LRP4 homologues in chicken lacking the gene encoding 
apoE, which is recognized by the ligand- binding repeats of 
mammalian LDLR, VLDLR, and apoER2. 

Expression of LRP4 TVonscnpte— Northern blot analy- 
sis of RNA from various murine tissues revealed hybridiza- 
tion of the LRP4 probe to a major transcript of 5.0 kb in 
mouse, with the highest expression in heart, relatively high 
levels in testis, and much lower levels in kidney and lung 
(Fig. 4A). Figure 4B shows a blot hybridization of RNA 
from various human tissues probed with the murine cDNA. 
In human tissues, major transcripts of 5, 2.6, and 2.3 kb 
and a minor transcript of 4 kb are detected almost exclu- 
sively in heart. A fainter but significant signal of 2 kb can 
also be detected in skeletal muscle and testis. The tran- 
scripts of 2.0, 2.3, 2.6. and 4 kb detected in human tissues 
may be a consequence of alternative splicing. 

/5'VLDL Binding—To test the possibility that LRP4 
might bind apoE-rich >?-VXDL (as do LDLR, VLDLR, and 
apoER2), an expression plasmid containing the entire 
coding region of murine LRP4 cDNA was constructed and 
introduced into COS- 7 cells i and ligand- binding activity 
was measured using "*I-labeled VLDL. As shown in Fig. 




Fig. 3. Genomic Southern blot analysis of LRP4-related se* 
quences in various eukaryotic species. A blot containing 10 ^g of 
£co Rl 'digested DNA from the indicated species was hybridized with 
the entire coding region of murine LRP-l cDNA under the conditions 
described in "MATERIALS AND METHODS" and exposed to Kodak 
XAR-5 film with an intensifying screen at — 80'C for 16 h. 



T3 

1 S « 



o 
c 



lllllllltl 



» I 



S.OItb 




Fig. 4. Expression of I4RP4 tran- 
scripts in mouse (A) and humans <B). 
Poly(A) RNA {2 ^g) from the indicated 
murine (A) and human (B) tissues was 
probed with 'T*. labeled murine LRP4 
cONA. The filters were exposed to Kodak 
XAR-5 film with an intensifying screen 
at -80'C for 14 h. Control hybridization 
with a rat glyceraldebyde- 3 -phosphate 
dehydrogenase (GAPDH) is shown in the 
lower portion. 



§!:124. No, 4. 1998 



Best^nailable Copy 



'788 



Fig. 6. Transient expression of LtRP4 
in COS cells. (A) Surface binding of "M- 
labeled >9-VLDL. COS cells transfected 
with an expression plasmtd encoding 
murine LRP4 (pLRP4-SRa). human 
apo£R2 (pNRl), or the parental vector of 
pLRP4-SRdr (pcDL-SRtf296) were in- 
cubated for 2 h at 4'C with the indicated 
concentrations of ^**I->9-VU>L (540 cpm/ 
ng), after which the values for surface- 
bound >9-VLDL were determined as 
d escri bed under ^MATERIALS AND 
METHODS.' (B) Northern blot analysis 
of UIP4 transcripts in COS cells trans- 
fected with murine LRP4 expression 
plasmid (LRP4). or the parental vector 
(pcDL.SRa296). Total RNA (lO>/g) 
from the indicated transfected cells was 
probed with 'H*- labeled murine LRP4 



Y. Tomita et oiA 




B 




kb 



O. 



O M 



3.5 - 




0.5 VO 



iM I . h . VLDL ( ng/ml ) 

cDNA. The filter was exposed to Kodak XAR-5 film with an intensifying screen at — 80'C for 12 h. 



5A, the level of surface bound >5-VLDL in LRP4- trans- 
fected cells was similar to those in cells transfected with 
equal amounts of the parental vector, despite the high 
levels of accumulation of 3.0-kb LHP4 mRNA (lacking 
approximately 2.0 kb in the 3'- untranslated region) in the 
LRP4-transfect€d cells (Fig. 5B). In control experiments, 
marked induction of '**I->5-VLDL binding was observed in 
cells transfected with human apoER2. 

DISCUSSION 

In the present study, we have shown the structure and 
expression of a novel member of the LDLR family termed 
LRP4. The most interesting feature of LRP4 is that, unlike 
other members of the LDLR family, this protein has a type 
II membrane protein- like structxire. The hydropathy plot 
analysis shows the presence of a hydrophobic region at 
amino acid residues 113-133 of murine LRP4. There are 
eight ligand -binding repeats clustered into two regions in 
the C- terminal side of this putative transmembrane region. 
Based on the presence of ligand- binding repeats in the 
extracellular regions of other LDLR family members, it 
seems reasonable to predict that the C-terminal side of the 
putative transmembrane region constitutes the extracel- 
lular region of the protein. 

Despite the presence of eight ligand- binding repeats. 
COS cells transfected with LRP4 failed to bind /5.VLDL. 
suggesting that LRP4 does not function in lipoprotein 
metabolism. Of the four clusters of ligand* binding repeats 
in LRP2, the recognition site for apo£ has been mapped to 
the second cluster (28), This suggests that these clusters 
are not functionally equal, despite their structural similar- 
ity. Therefore, the ligand -binding repeats in LRP4 may be 
fimctionally different from those in other family members 
that bind >^-VLDL. 

Although the exact function and ligands of LRP4 remain 
unclear, the abundant expression of LRP4 transcripts in 
heart is noteworthy. Based on the structural features of 
LRP4 and its almost exclusive expression in the heart. 
LRP4 may play a role as a surface receptor that is related 
to cardiac function. Further studies are necessary to 
elucidate the exact role of this structurally interesting 
molecule. 



We thank Kyoko Ogamo and Nami Suzuki for secretarial assistance. 



REFERENCES 



1. 



3. 



4, 



Yamamoto, T., Davis, CO., Brown, M.S.. Schneider. 'W.J., 
Casey, M.L., Goldstein, J.L., and Russell, D.W. (1984) The 
human LDL receptor: a cysteine'rich protein with multiple Alu 
sequences in its mRNA. Cell 39. 27-38 

Kim, D.H., lijima, H.. Goto. K., Sakai, J., Ishii, H., Kim, H.j., 
Suzuki, H., Kondo. H., Saeki, S.. and Yamamoto, T. (1996) 
Human apolipoprotein E receptor 2. A novel lipoprotein recL'ptor 
of the low density lipoprotein receptor family predominantly 
expressed in brain. J, Biol. Chem, 271, 8373-8380 
Kim, D.H., Magoori, K., Inoue. T.R., Mao, C.C., Kim. H.J., 
Suzuki, H., Fujita, T.. Endo. Y.. Saeki. S., and Yamamoto, T.T. 

(1997) Exon/intron organization, chromosome localization, alter- 
native splicing, and transcription units of the human apolipo- 
protein £ receptor 2 gene. c/. BioL Chem. 272. 8498-8504 
Takahashi, S., Kawarabayasi, Y., Nakai, T., Sakai. J., and 
Yamamoto, T. (1992) Rabbit very low density lipoproteio 
receptor; a low density lipoprotein receptor-like protein with 
distinct ligand specificity. Proc. Natl. Acad. Sci. USA 89, 9252- 
9256 

Sakai. J., Hoshino, A., Takahashi, S., Miura, Y.. Ishii. H., 
Suzuki, H., Kawarabayasi, Y.. and Yamamoto. T. (1994) Struc- 
ture, chromosome location, and expression of the human very low 
density lipoprotein receptor gene. J. BioL Chem. 269, 2173-2182 
Schonbaum. CP.. Lee. S., and Mahowald. A.P. (1995) The 
Drosophila yolk less gene encodes a vitellogenin receptor belong- 
ing to the low density lipoprotein receptor superfamily. Proc. 
Natl. Acad. Sci. USA 92. 1485-1489 

7. Sappington, T.W.. Kokoza. V.A., Cho, W.L., and Raikhet. A.S. 

(1998) Molecular characterization of the mosquito vitellogenin 
receptor reveala unexpected high homology to the Drosophila 
yolk protein receptor. P>ioc. Natl. Acad. ScL USA 93. 8934-8939 

8. Herz, J., Hamann, U., Rogne, S., Myklebost, O., Gausepohl, H.. 
and Stanley, K.K. (1988) Surface location and high affinity for 
calcium of a 500 kD liver membrane protein closely related to the 
LDL- receptor suggest a physiological role as lipoprotein receptor. 
EMBO J. 7, 4119-4127 

Raychowdhury, R.. Niles, J.L., McCluskey, R.T., and Smith, 
J. A. (1989) Autoimmune target in Heymann nephritis is & 
glycoprotein with homology to the LDL receptor. Science 244,^ 
1163-1165 

Saito, A., Pietroraonaco, S., Loo. A.K., and Farquhar. M.G^ 
(1994) Complete cloning and sequencing of rat gpSdO/'megalin,' 
a distinctive member of the low density lipoprotein receptor gene 
family. iVoc. NatL Acad. Sci. USA 01. 9725-9729 



5. 



6. 



9. 



10. 





ance. 



The 
> Alu 

H.J., 
996) 
ptor 
untly 

H.J., 
T.T. 
liter* 



J — ■ 

'jLj^uj.Density Lipoprotein Receptor- Related Protein 4 



Bujo, H.. Hermann, M., Schneider, W,J., and Nimpf, J. (1996) 
A new branch of the LDL- receptor family tree: VLDL-receptoro. 
2. GastroenteroL 3. 124-126 

Jocobsen. 1.., Madsen, P., Moestrup, S.K.. Lund, A.H., Tommer- 
op, N., Nykjaer. A., Sottrup- Jensen, L., Gliemann. J., end 
Petersen, CM. (1996) Molecular characterization of a novel 
human hybrid- type receptor that binds the alphai-macroglobulin 
receptor- associated protein. J. BioL Chem. 271. 31379-31383 
Russell. D.W., Lehrman. M.A., SUhof, T.C., Yamamoto, T.. 
Davis. C.G., Hobbs, H.H.. Brown, M.S., and Goldstein. J.L. 
( T P^7) The LDL receptor in familiaj hypercholesterolemia: Use 
c' :inian mutations to dissect a membrane protein. Cold Spring 
Harbor Symp. QuanL Biol. 51, JB 1 1-819 

Hobbs. H.H., Russell. D.W., Brown, M.S., and Goldstein, J.L. 
(1990) The LDL receptor locus in familial hypercholesterolemia: 
mutational analysis of a membrane protein. Annu, Rev. Genet. 
24. 133-170 

15. Bujo. H.. Hermann, M.. Kaderli, M.O., Jacobsen, L., Sugawara. 
S.. Nimpf. J., Yamamoto. T., and Schneider, W.J. (1994) 
Chicken oocyte growth is mediated by an eight ligand binding 
repeat member of the LDL receptor family. EMBO J. 13, 5165- 
6175 

16. F'.'jo, H., Yamamoto, T., Hayashi. K., Hermann, M., Nimpf, J., 
; Schneider. W.J. (1995) Mutant oocytic low density lipopro- 
tein receptor gene family member causes atherosclerosis and 
female sterility. Proc. Natl. Acad. Sci. USA 92, 9905-9909 

17. Frykman, P.K.. Brown, M.S., Yamamoto. T., Goldstein, J.L., 
and Hers, J. (1995) Normal plasma lipoproteins and fertility in 
gene- targeted mice homozygous for a disruption in the gene 
encoding very low density lipoprotein receptor. Proc. NatL Acad. 
ScL USA 92. 8453-8457 

18. Ishibashi, S.. Herz. J., Maeda, N., Goldstein. J.L.. and Brown, 
M.S. (1994) The two- receptor model of lipoprotein clearance: 
test of the hypothesis in "knockout" mice locking the low density 
]\ ••-'•otein receptor, apolipoprotein E, or both proteins. Proc. 
h\. . Acad. Sci. USA 91. 4431-4435 

19. Ishii. H.. Kim. D.H., Fujita, T., Endo, Y., Saekt. S-, and 
Yamamoto, T.T. (1998) cDNA cloning of a new low density 



789 



lipoprotein receptor-related protein and mapping of its gene 
(LRP3) to chromosome bands 19ql2-13.2. Genomics 61. 132-135 

20. Sambrook, J., Fritsch. E.F., and Maniatis, T. (1989) Molecular 
Cloning: A Laboratory Manual, 2nd ed.. Cold Spring Harbor 
Laboratory. Cold Spring Harbor, NY 

21. Sanger. F., Nicklen, S., and Coulson, A,R. (1977) DNA sequenc- 
ing with chain- terminating inhibitors. Proc. NatL Acad. Sci. 
USA 74, 5463-6467 

22. Takebe, Y.. Seiki, M., Fujisawa. J., Hoy. P., Yokota. K., Arai, 
K-, Yoshida. M., and Arai, N. (1988) SR alpha promoter: an 
efficient and versatile manrnialian cDNA expression system, 
composed of the simian virus 40 early promoter and the R-U5 
segment of human T-cell leukemia virus type 1 long terminal 
repeat. Mol. Cell. Biol. 8. 466-472 

23. C%en, C.A. and Okayama, H. (1987) High-efficiency transforms- 
tion of mammalian cells by plasmid DNA. Mol, Cell. Biol. 7, 
2745-2752 

24. Kovanen. P.T., Brown, M.S., Basu. S.K., Bilheimer, D.W.. and 
Goldstein, J.L. (1981) Saturation and suppression of hepatic 
lipoprotein receptors: a mechanism for the hypercholesterolemia 
of cholesterol -fed rabbiU. Proc. NatL Acad. Sci. USA 78, 1396- 
1400 

25. Goldstein, J.L., Basu, S.K., and Brown, M.S. (1983) Receptor- 
mediated endocytosis of low-density lipoprotein in cultured cells. 
Methods EnzymoL 98, 241-260 

26. Kyte, J. and Doolittle. R.F, (1982) A simple method for display- 
ing the hydropathic character of a protein. J. MoL Boil. 137, 105- 

< 132 

27. Nairn, H.Y. and Roth, M.G. (1994) Characteristics of the inter- 
nalization signal in the Y543 influenza virus hemagglutinin 
suggest a model for recognition of internalization signals contain- 
ing tyrosine. J. BioL Chem. 269. 3928-3933 

28. Orlando, R.A., Cxner, M., Czekay, R.P., Yamazaki. H., Saito. A., 
Ullrich. R., Kerjaschki. D., and Farquhar. M.G. (1997) Identifi- 
cation of the second cluster of ligand-binding repeats in megalin 
as a site for receptor- ligand interactions. Proc. NatL Acad. Sci. 
USA 94, 2368-2373 



and 
otein 
%ntb 
252- 

. H., 
true- 
i' low 
2182 
The 
long- 
Proc 

A.S. 
:eniii 
^hila 
3939 
, H.. '.^\ 
y for ^ 
othe ^\ 
ptor. 

nith,/: 
is 

244.-5 

M.G. 
im. 
gene^ 



hem^- 



jtVol. 124, No. 4. 1998 





Exhibit 37 



The Journal or Biological Chemistry 

(d 1991 by The American Society for Biochemistry and Molecular Biology. Iik. 



Vol. 266, No. 25, Issue of Sepunnber 5, pp. 1694S-169&3. 1991 

Printed in U.S.A. 



Hepsin, a Cell Membrane-associated Protease 

CHARACTERIZATION, TISSUE DISTRIBUTION. AND GENE LOCALIZATION* 

(Eleceived for publication, August 2. 1990) 

Akihiko Tsuji, Adrian TorreS'Rosado, Toshiro Arai, Michelle M. Le Beaut§, Richard S. LemonsH, 
Shan-Ho Chou||**, and Kotoku Kurachit^ 

From the Department of Human Genetics, University of Michigan Medical School, Ann Arbor, Michigan 4S109, the XSection of 
Hematology and Oncology, University of Chicago, Chicago, Illinois 60637, the ^Department of Pediatrics, University of Utah, Salt 
Lake City, Utah 84132, and the y^Department of Biochemistry, University of Washington. Seattle, Washington 98109 



Hepsin, a putative membrane-bound serine protease, 
was originally identified as a human liver cDNA clone 
(Leytus, S. P., Loeb, K. R., Hagen, F. S., Kurachi, K., 
and Davie, E. W. (1988) Biochemistry 27, 1067- 
1074). In the present study the human hepsin gene was 
localized to chromosome 19 at ql 1-13.2. The messen- 
ger RNA of hepsin is 1.85 kilobases in size and present 
in most tissues, with the highest level in liver. Hepsin 
is synthesized as a single polypeptide chain, and its 
mature form of 61 kDa was found in various mamma- 
lian cells including HepG2 cells and baby hamster kid- 
ney cells. It is present in the plasma- membrane in a 
molecular orientation of type II membrane-associated 
proteins, with its catalytic subunit (carboxyl-terminal 
half) at the cell surface, and its amino terminus facing 
the cytosol. Hepsin is found neither in cytosol nor in 
culture media. The results obtained suggest that hepsin 
h£U3 an important role(s) in cell growth and function. 



Proteases play important roles in a number of physiological 
and pathological processes such as protein catabolism, blood 
coagulation, fibrinolysis, and in the complement system (1- 
3). The importance of proteases in many phenomena includ- 
ing cell proliferation, inflammation, development, tumor 
growth, and metastasis are also well described. Their involve- 
ment in carcinogenesis as well as in cell growth is further 
supported by the anticarcinogenic and anti-cell growth effects 
of protease inhibitors (4. 5). Most of these are non-membrane 
bound intra- or extracellular proteases. Recently, several 
membrane-associated proteases have been described. A cell 
surface protease with molecular weight of 67,000 has been 
reported (5-7). This protease, which is inhibited by ai -anti- 
trypsin (5), was found to be essential for cell proliferation and 
was suggested to be involved in various biological processes 
of cells, in addition to the degradation of extracellular matrix 
proteins. Guanidinobenzoatase, which can cleave flbronectin 
at Gly-Arg-Gly-Asp, the sequence involved in the attachment 
of flbronectin to cell surfaces, has been described (8-10). This 
protease is located on the surface of most tumor cells, as well 
as in the fluid surrounding tumor cells. A fluorescent compet- 

* This work was supported in part by National Institutes of Health 
Grant HL38644 (to K. K.). The costs of publication of this article 
were defrayed in part by the payment of page charges. This article 
must therefore be hereby marked ^^advertisement** in accordance with 
18 U.S.C. Section 1734 solely to indicate this fact. 

§ Scholar of the Leukemia Society of America. 

** Associate member of the Howard Hughes Medical Institute, 
University of Washington. 

tt To whom all correspondence and reprint requests should be 
addressed. 



itive inhibitor has also been used to localize this protease on 
the tumor cell surface (9). A trypsin-like membrane-associ- 
ated protease of an estimated molecular weight of 120,000 
which is present in the liver has been proposed to be involved 
in membrane protein turnover (11), A membrane -bound tryp- 
sin-like protease has also been recognized in other cells such 
as neuroblastoma cells (12). More recently, a 170-kDa mem- 
brane-bound protease (gelatinase) has been implicated in 
melanoma cell invasiveness (13). As described in these reports 
the cell surface proteases are considered to play an important 
role(s) in cell growth, cell invasion of other tissues (such as 
in metastasis), angiogenesis, and tissue rearrangement, in 
addition to various other cellular processes. 

Hepsin is a putative serine protease of 417 amino acid 
residues originally identified from cDNA clones isolated from 
human liver cDNA libraries (14). In a previous study, a 
synthetic oligonucleotide probe for the amino acid sequence 
Met-Phe-Cys-Ala-Gly, which is conmion to many serine pro- 
teases, was successfully employed to isolate a number of 
known and novel proteases including hepsin. Hepsin contains 
a short hydrophobic amino acid sequence in the region near 
the amino terminus while its carboxyl-terminal half is a 
typical serine protease module. The hydrophobic sequence, 
composed of 27 amino acid residues, is very similar to the 
typical lipid bilayer membrane-spanning sequences found in 
many other membrane -associated proteins (14). In our pre- 
lim Lnary immunostaining study, hepsin was found to be pres- 
ent in cultured cells such as HepG2 and baby hamster kidney 
(BHK)' cells (15). It is highly likely that hepsin may have a 
role(s) similar to other cell membrane-bound proteases de- 
scribed above in cell growth and in other cell functions. 
Presently, however, the protein chemical and enzymatic prop- 
erties as well as the precise biological ro]e(s) of hepsin are not 
known. 

In this report, we describe evidence that demonstrates the 
actual existence of hepsin in cells. This includes determina- 
tion of the estimated molecular weight of cellular hepsin, its 
subcellular localization, topology at the cell surface, chromo- 
somal localization of its gene, as well as its tissue distribution 
of expression. 

EXPERIMENTAL PROCEDURES 

Materials — Keyhole limpet hemocyanin and bovine pancreatic 
trypsin were obtained from Sigma. Freund's adjuvant was purchased 
from Difco. Synthetic peptides were made by an automated peptide 
synthesizer (Applied Biosystems, model 438) employing solid-phase 
i-butoxycarboxyl chemistry. These peptides had free a-carboxyl 



* The abbreviations used are: BHK, baby hamster kidney; PBS, 
phosphate-buffered saline; SDS, sodium dodecyl sulfate; EGTA, (eth- 
ylenebi8(oxyethylenenitrilo)ltetraacetic acid; kb, kilobase. 



16948 



Characterization of Hepsin 



16949 



groups. Activated CH-Sepharose 4B and PercoU were obtained from 
Pharmacia. Tissue culture supplies and proteinase K were purchased 
from Gibco/BRL (Life Technologies, Inc.). "C- Labeled size marker 
protein kits were obtained from Du Pont-New England Nuclear. All 
radioactive nucleotides were purchased from Amersham Corp. The 
protein assay kit as well as peroxidase-conjugated goat anti -rabbit 
IgG were obtained from Bio-RadL Adenosine 5' -phosphate and 4- 
ch!oro-l-naphthol were purchased from Sigma. Nylon membranes 
(GeneScreen Plus^) and the reticulocyte cell -free translation kit were 
from New England Bio Lab (Du Pont). 

Preparation of Antibodies — Five synthetic peptides (Pi, amino acid 
1-17; P2, 246-267; P3, 294-305; P4. 360-372; and P5, 398-417) 
corresponding to the amino acid sequence of hepsin predicted from 
the cDNA sequence (14) were employed to raise antibodies. Pi, PM 
(equimolar mixture of P2, P3, P4), and P5 correspond to the se- 
quences of the ami no- terminal region, the catalytic subunit, and the 
carboxyl- terminal region, respectively. Pi, PM, or P5 were separately 
coupled to the keyhole limpet hemocyanin by using glutaraldehyde 
as a coupling agent as described by Keichlin (16). Rabbits were 
immunized with a mixture of keyhole limpet hemocyanin -peptide 
conjugate with Freund*8 adjuvant as follows: 5 mg of the conjugate in 
complete Freund's adjuvant was injected subcutaneously on day 1, 
and 1 mg of conjugate in incomplete Freund's adjuvant (1:1) was 
injected on days 14, 21, and 28. After the third and fourth injection 
on days 14 and 28, animals were bled from the ear vein to test the 
titer. After the fifth week, blood samples were collected from the 
animals by heart puncture, and were then used to prepare affinity 
purified antibodies. 

Affinity purification of these antibodies was carried out as follows: 
peptide column was prepared by adding peptides (10 mg dissolved in 
20 ml of 0.1 M NaHCOa, pH 9.0) to the activated CH-Sepharose 4B 
(1 g dry weight) (Pharmacia) according to the manufacturer's instruc- 
tions. Antiserum (3 ml), which was incubated with 8 mg of hemocy- 
anin for 1 h at room temperature, was applied to the column (2.6 ml) 
followed by extensive washing with 10 mM sodium phosphate, pH 
7.4, containing 0.15 M NaCl (PBS). The bound immunoglobulins were 
then eluted with 0.1 m glycine-HCl buffer, pH 2.3, into 0.2 ml of 1 M 
Tris-HCl buffer, pH 7.0. The eluate was dialyzed against PBS and 
stored at —80 "C until use. Affinity purified antibodies prepared 
against peptides Pi, PM, and P5 were designated HAbPl, HAbPM, 
and HAbP5, respectively. Immunoblot tests showed that HAbPM 
and HAbPS were highly specific, while HAbPl was not, probably due 
to cross -reactivity with similar amino acid sequences apparently 
present in other proteins. 

Cell Culture — HepG2 cells and BHK cells were cultured in Eagle's 
minimum essential medium (Gibco) supplemented with streptomycin, 
penicillin, and 10% fetal calf serum in a 5% CO2 incubator at 37 °C. 

Fractionation of Cellular Components by PercoU Density Gradient 
Centrifugation — HepG2 cells (-^-O X 10^ cells) were harvested by 
scraping, washed twice with PBS (1000 rpm for 5 min at 4 "C), and 
resuspended in 3 ml of ice-cold STE solution (0.25 M sucrose, 10 mM 
Tris-HCl buffer, pH 7.5, containing 2 mM EGTA) followed by ho- 
mogenization with a Tekmar Ultra-Turrax tissue homogenizer for 15 
s. plasma membrane and mitochondrial fractions were isolated by 
the method of Belsham et al. (17) with minor modifications. Briefly, 
the homogenates were centrifuged at 100 x g for I rain. The pellets 
obtained were resuspended in 2 ml of STE solution, homogenized, 
and centrifuged. The two supernatants were combined and centri- 
fuged at 5000 X g for 15 min. A fraction (0,5 ml) of the pellet was 
suspended in 1.0 ml of STE solution, dispersed in 10 ml of iso-osraotic 
Percoll solution (7 volumes of PercoU, 1 volume of 2 M sucrose, 80 
mM Tris-HCl buffer, pH 7.5, containing 8 mM EGTA and 32 volumes 
of STE solution), and centrifuged for 20 min at 10,000 X g (Sorvall 
and RC6C with SS34 rotor). Two membrane bands, one immediately 
below the surface (plasma membrane) and the other close to the 
bottom (mitochondria) were separately collected into 4 volumes of 10 
mM Tris-HCl buffer, pH 7.5, containing 0.15 M NaCl. The two 
fractions collected were then centrifuged at 10,000 x g for 3 min to 
obtain membrane samples. The enrichment of the plasma membrane 
prepared was monitored by assaying a plasma membrane -associated 
lipoprotein, 5 '-nucleotidase, according to Windell and Unkeless (18). 
The purity of the membrane preparation was further tested by assay- 
ing activities of glucose O-phosphataae (microsome marker) and suc- 
cinate-cytochrome c reductase (mitochondria marker) according to 
Sottocasa et oL (19) with minor modifications. The microsome frac- 
tion used as a control in the assay was prepared as previously 
described (19, 20). 

An aliquot of the cell homogenates (above) was subjected to cen- 



trifugation at 100,000 X g for 30 min at 4 *C in a SW41.1 rotor 
(Beck man model L5-50 centrifuge). The supernatant collected was 
used as the cytosol fraction. The nuclear fraction was prepared from 
cell homogenates by sucrose density gradient centrifugation according 
to Blobel and Potter (21). 

Plasma membrane, mitochondria, and nuclear fractions were sol- 
ubilized with 0.2 ml of 10 mM TrU-HCI buffer, pH 7.5, containing 
0.15 M NaCI and 0.5% (w/v) Nonidet P-40 and used for immunoblot 
analysis. 

Immunoblot Analysis — Protein concentration of the samples was 
determined by the method of Bradford (22) with minor modifications. 
Proteins of solubilized plasma membranes, mitochondria, nuclei, 
cytosol, as well as culture media, were adjusted to a concentration of 
0.5 mg/ml with gel loading buffer (62.5 mM Tris-HCl, pH 6.8, con- 
taining 10% glycerol, and 2% SDS) and incubated at 4 'C for 12 h or 
at 95 "C for 3 min. An aliquot (7.5 fig of proteins) of the sample was 
subjected to SDS-polyacrylamide gel (12%) electrophoresis employing 
a Bio-Rad mini gel apparatus. The electrophoresed proteins were 
transferred to a nitrocellulose filter according to Towbin et al. (23). 
The blotted filter was blocked with 3% bovine serum albumin in 50 
mM Tris-HCl, pH 7.5, conUining 0.15 M NaCl (TBS) at 37 *C for 30 
min, followed by incubation at room temperature for 2 h with anti- 
bodies (P5) raised against the synthetic peptide containing the car- 
boxy l-terminal sequence of hepsin (500-fold dilution in TBS contain- 
ing 1% bovine serum albumin). The filter was washed 3 times with 
TBS containing 0.05% Tween 20 and incubated at room temp>erature 
for 2 h with horseradish peroxidase-conjugated goat anti-rabbit IgG 
which was diluted 1000-fold. The filters were then incubated with 
TBS containing 4-chIoro-l-naphtol (0.5 mg/ml) for 30 min at room 
temperature. 

Proteolysis of HepG2 Cells — Mild proteolysis of HepG2 cells to test 
the topology of hepsin at the cell surface was carried out as follows: 
HepG2 cells (about 90% confluency) in nine 10-cm culture dishes 
(total of about 4.5 x 10^ cells) were washed twice with phosphate- 
buffered saline (0.15 M NaCl, 8 mM Na2HP04, 0.6 mM KH2PO4), pH 
7.4, and incubated in the buffer for 30 min on ice with or without 10 
Mg/ml proteinase K or 100 Mg/ml bovine pancreatic trypsin. Under 
these conditions, HepG2 cells did not significantly lose their viability. 
Cells were then washed twice with the phosphate buffer and used for 
preparing plasma membrane proteins as described above. Aliquots 
(20 Mg each) of protein samples were subjected to immunoblot analysis 
as described above employing the affinity- purified antibody, P5. 

Fluorescent Immunostaining of Cultured Cells — Cells were main- 
tained at 37 "C in 5% CO:i in minimum essential medium containing 
10% fetal calf serum and antibiotics. Cells grown to subconfluency 
on coverslips (8 wells/slide; Miles Laboratories) were fixed at room 
temperature for 10 min with 2% paraformaldehyde and 0.2% glutar- 
aldehyde in PBS containing Ca^'*' and Mg^* (Gibco). After rinsing 
several times with PBS, cells were incubated with goat serum at a 
dilution of 1:20 in PBS at room temperature for 15 min to block 
nonspecific binding of the antibody. After several additional rinses 
with PBS, cells were incubated with purified antisynthetic peptide 
IgG (2-5 Mg/ml of PM which recognizes the middle portion of the 
putative catalytic subunit) in PBS containing bovine serum albumin 
(1 mg/ml) with and without 0.05% Triton X- 100 for 2 h in humidified 
Petri dishes. The bound IgG was visualized by incubating for 30 min 
with goat anti-rabbit IgG labeled with fluorescein isothiocyanate 
(diluted 1:50 with PBS). In control experiments: 1) the antibodies 
were preincubated with synthetic peptides (1 mg/ml of PM) used for 
raising antibodies before incubating with cells; or 2) PBS containing 
no antibodies with or without synthetic peptides (1 mg/ml) was added 
to cells; or 3) anti-hepsin antibodies were replaced with anti-human 
blood coagulation factor IX. For testing any intracellular immuno- 
staining, cells were treated with 0.5% Triton X-100 for 3-5 min before 
incubating with the antibodies (HAbPM). The cells were immediately 
examined by fluorescence microscopy and photographed. In this 
experiment, HAbPl antibodies (specific for the a mi no -terminal re- 
gion) were not employed because their specificity was found to be low 
in immunoblot analysis and they recognized not only the 51-kDa 
band but also a significant number of other bands. 

RNA Blot Analysis — Total RNAs of various baboon tissues were 
prepared by the guanidinium isothiocyanate method described by 
Chomczynski and Sacchi (24). RNA preparations (20 Mg for each 
tissue) were electrophoresed in a 1,5% agarose gel containing 6.7% 
formaldehyde in 20 mM phosphate buffer, pH 7.0 (25). The agarose 
gels were then blotted onto GeneScreen Plus^ membranes (Du Pont/ 
New England Nuclear), followed by baking for 2 h at 80 'C. A hepsin 
cDNA (1.8 kb) (14) was labeled with [a-='''P]dCTP by using an 



16950 



Characterization of Heps in 



oiigolabeling kit (Phariiincin) to a specific activity of about 1 x 10'^ 
cpm/^ig. Frehybriclizniion, hybridization with the radiolaljcled cDNA 
protie. and washiiifj; were carried out as described by the manufacturer 
for the GeneScreen Plus^^ membrane. The membrane was then ex- 
posed to x-ray film (Kodak X-Omat AH) at — 70 *C. A ribosoma! 
KNA tjenu probti was used to confirm the presence of RNAs in each 
Inne of the blot. 

Molecular Mapping of the Getw Jajcuh — A panel of somatic cell 
hybrids for mappinj; was established by PEG lOOO-mediaied cell 
fusion of human VA2, Ari-19. IMHlH) fibroblast or peripheral human 
lymphocyte cells lo either Chinese E'Sfi or Syrian HHK-Bl hamster 
cells as previously described (261. A panel of hybrids for mapping! was 
established after charncterixation for their human chromosome con- 
tent by screening up to 'M ^jene enzyme systems and, in selected 
cases, by cytogenetic analyses. ^"^P- Labeled hcpsin cONA (1.8 kb) ( 1- 
3 X 10^* dpm/;iB) was hybridized to DNA blots of these hybrids and 
controls which had been digested to completion with HindlW, /iamHI, 
or EcoHl, electrophoresed. and blotted as described (26). 

In situ chromosomal hybridization was carried out as follows: 
human metaphase cells were prepared from phytohemagglutinin- 
stimulated peripheral blooti lymphocytes (27). A radiolal»eled. hepsin- 
specific cDNA probe was prepared by nick translation of the entire 
plasmid with all four dcoxynucleoside triphosphates '^H-lahclcd to a 
specific activity of 1-2 X 10** dpm/pg. In situ hybridization was 
performed as described previously (27). Metaphase cells were hyr>rid- 
ized at 2.0 and 1.0 ug of probe/ml of hybridizHtion rnixuiro. Aiitora- 
diofjraphs were exposed for 1 1 days. 

Cell' free Tranficriptitm nf Hfpsin < DNA and in Vitro Translation-- 
l iepsin cDNA (l.H kb) (]4) was inserted into the pSGo vector (Sira- 
tagene) for both orientations at the unique /%coHI site downstream of 
the 7'" promoter. The chimeric plasmid was then transfected into 
Escherichia coii TB-\ cells antl amplified followed by preparation 
employing the alkaline-SDS method and CsCl gradient- uhracentrif- 
ugation. The plasmicLs were linearized by digestion with Xha] located 
downstream of the insert in the vector sequence, followed by incu- 
bation with proteinase K (50 ;ig/ml) at 37 *C for 30 min. The reaction 
mixture was extract ed twice with phenol/chloroform (1:1) and ethanol 
precipitated prior to subjecting it to transcription reactions. The 
linearized plasmid DNAs were dissolved in TE buffer (10 mM Tris- 
HCI, pH 7.4, 0.1 mM EiyPA prepared with diethyl* pyrocnrlx>nate' 
treated water) and employed as a template for transcription reactions. 
Cell-free tran.script ion was carried out at 37 'C for 30 min with T7 
HNA polymerase using an mRNA capping kit (Stratagene) according 
to the manufacturer's instnjctions. The trans<-ript ion reaction mix- 
ture was then added to 2.") unit.^ of KN'ase free-ONase I (Vtllowed by 
an additional incubation for 5 min at 37 *('. Synrhesize<i KNA was 
precipitated with ethanol after extracting once with phenol/chloro- 
form (1:1), diss<ilved in TE buffer, and employed in translation 
reactions. The I-tNAs synthesized were quantitated by reading the 
absorbance at 2f)0 urn. The size of the FiNA was determined by 
formaldehyde -agarose gel electrophoresis. Generally, a bom. 40-*I5 /ig 
of RNA (13 kb) were obtained from 2-5 of DNA template. 

The prepared hepsin RNA (1-2 ng) was then subjected to transla- 
tion at 30 'C in the presence of (^"^Slmcthionino by employing the 
rabbit reticulocyte lysate system (New England Biolab) according to 
the manufacturer's instructions. An aliquot (o pi) of the translation 
reaction mixture (25 ^\) was mixed with the loading buffer, treated 
in boiling water for 5 min. and subjected to SDS-polyacrylamide gel 
(15%) electrophoresis. After electrophoresis, the polyacrylamide gel 
was treated with Amplify (Amershan)) for 15 min according to the 
manufacturer's instructions to enhance the radioactivity signals, 
dried, and exposed to x-ray film at —70 'C. 

KKSULTS 

Subcellular i^tcalization of Hepftin — Ininiunobloi analysis of 
HepG2 as well as BHK cells is shown in Fig, 1, Bnscd on the 
5 '-nucleotidase activity Assayed, the plasma membrane prep- 
aration used in this experiment was found to be enriched 18- 
fold over the crude cell membrane starting material. The 
membrane preparation was highly pure with little contami- 
nation by microsomes and mitochondria, as monitored by 
glucose 6-phosphatase and succinate-cjlochrome c reductase 
(<0.2% and 0,5% contamination, respectively). Protein bands 
of 51 and 28 kDa were ob.servecl at high concentration levels 
in the extracts of cell tncmbrane fractions prepared from 




33- 

24- 



Vl- 4-,.< ■» . »■* 



-28 



KtG. 1. Immunobtot analysis of ncpG2 nnd BHK cells. Ex- 
perimental details are described under ""Experimental Pr<:H:edures." 
Aliquots (7.5 pg) of proteins of various cell sul)Components and media 
are loaded for each lane. iMne /, BHK cell membranes; l^ne 2» 
HcpG2 cell membranes; Lone .V, MepG2 c>tosol; Ixme HepG2 
media. The numbers on the left show the i>ositions of size markers. 



^ 2 3 4 5 6 7 8 



JSC- 




Fig, 2. Ceil free translation assay of hepsin cDNA, Lane L 
'*C-labeled size marker proteins (from the top: myosin, y-globutins, 
phosphorylase 6, bovine serum albumin, ovalbumin, carlw)nic anhy- 
drase, lactoglobulin, cytochrome c. respect tvfily); Ume 2, no UNA 
added; Iauu:s 3 and ^t, 0A2 and 1.7 /ig in vitnt transcripts (sense 
strand) were added. resp<*ct ively; ixuifs 5 and Ci. 0,*12 and 1.7 uil of in 
vttrt) transcripts fantisLMise strand) ndded. respecitvoly: /^m*; 7, I,H 
/i^ of pS05 (no hepsin ir^sert) transcrib<*d KN.A: /^n*- S. 2 //ji of 
human placenta liNAs. The numbers on the li.'ft indtraie the [M>sit.ions 
and sizes of relevant size marker proteins. 

HepG2 ceils, while BHK cells showed only the major band 
(51 kDa). These bands were competed out with the addition 
of P5 (synthetic peptide of the carboxyl -terminal region of 
hepsin) which was used to raise the antibodies employed in 
the immunoblot analysis. These bands were also present at 
reduced levels in nuclear and mitochondrial fractions (data 
not shown), but neither in the cytosol nor in ctdture media. 
The presence of hepsin in nuclei and mitochondria may be 
due in part to possible cell membrane contamination in these 
fractions. These restilts indicate that hepsin is a protein 
primarily associated with the plasma n»einhrane. 

Coil-free Translation Analysis — When :>i uitro transcripts 
of hepsin cDN.A were employed in cell -free translation assays, 
a specific polypeptide band of 44 kDa was observed in SDS- 
polyacrylamidc electrophoresis (Fig. 2). The estimated size of 
the band agreed reasonably well with that expected from the 
cDNA (14). The larger molecular size observed in immunoblot 
analyses of all extracts may be due to the lack of potential 
post-translational modifications such as glycosylation. A pos- 
sible site for the A^-linked carbohydrate chain attachment is 
at amino acid 112 of the hepsin molecule. Hepsin may also 
contain O- linked carbohydrate chains. 

Tissue Distribution of Hepsin Ocne Expression — The tissue 
distribution of hepsin expression was analyzed by RNA blot 
analysis of total UNA samples prepared from a young ndvilt 



D 

o 

O 
Q> 

a. 

n> 

Q. 
O 

3 



or 
o 
b 

a 

zr 
*< 

o 

3 

o 

o 

o 
cr 

CD 
—1 

cn 

ro 
o 
o 
-J 




I 
1 



sis 



Characterization of Heps in 



16951 



baboon tissue including the h>pothalnmus, small intestine, 
pancreas, testis, salivary gland, skeletal muscle, lung, adrenal 
gland, thyroid, pituitary gland, liver, spleen, kidney, brain, 
and thymus (P'ig. 3). The results showed that the mRNA for 
hepsin was 1 kb in size, and was ftnind at the highest level 
in the liver. U was also pre.stMit in ot her tissues. all)cit at nuich 
lower levels, including the kidney, pancreas, lung, thyroid, 
pituitary gland, as well as the testis. Extremely low levels of 
the niRNA were found in the thynius, .spleen, small intestine, 
and in the adrenal ghind. These results indicate that hepsin 
is ubiquitously expressed in various tissues with preferred 
tissue specificity for liver. 

Chromosomal Uicalization of the Hepsin Gene — To obtain a 
chromosome assignment for the hepsin gene, a hepsin cDNA 
probe was hybridized to Southern blots of a i>anel of somatic 
cell hybrids. The results showed perfect concordance between 
human chromo.some 19 and hepsin (Table I). A significant 
discordance was observed between hepsin and all of the other 
hutTian chromosomes (27-59%), 

To determine the chromf)somal localization of the hepsin 
gene using an independent method and to sublocali/e this 
gene, we hybridi/.ed a hejisin-speciHc probe (cDNA) to normal 
metaphase chroinosrui^es. This resulted in specific labeling 
only of chromosome Hi. Of 100 metaphase cells examined 
from this hybridization, 39 were labeled on region ijl of one 
or both chromosome 19 homologues. The distribution of la- 
beled sites on this chromosome is illustrated in Fig. 4. Of 224 
total labeled sites observed, 64 (28.6%) were located on chro- 
mosome 19. These sites were clustered at bands ql 1-13.2 and 
this cluster represented 21.9% (49/224) of all labeled sites 
(cumulative probability for the Poisson distribution is 
<c0.0005). The largest number of grains was observed at 
1.9ql3,!. Similar results were obtained in three additional 
hybridization experiments using this prol>e. Thus, the hepsin 
gene is localized to chromosome 19, at bai^ds ql 1-13.2, 

Immanofluores'cent Stainin/^ of Cul lured Cells — Cultured 
cells including Hep(.i2 and BHK cells were immunostained 
for hepsin with antibodies (HAbPiVl) raised against the syn- 
thetic peptides (PM. an eciui molar mixture of Pi, P2, and P3) 
designed to the catalytic subunit of hepsin (Kig. M). The 
antibodies employed uniformly stained HepG2 cells. BHK 
cells were also stained, but at reduced intensity. The staining 
was completely competed out when synthetic peptides u.sed 

1 a 3 4 5 6 7 8 9 10 11 12 13 M 15 




FlO. 3. UNA blot analysis of young adult baboon tissue. Each 
Inno conrainod 20 fi^ of total RNAs isolntcd from a younji adult 
baboon. Ijinatt 1-15 contain h\pothalamus. snioll in test inc. pancreas, 
lost is. salivary (^land. skolcial niusclo. luny, adrenal f^tand. t.hyroid, 
pitiiit/iry j^Wmd. liver, spleen, kidnoy. brain, and thymus, respoirlively. 
The sl'/e and positions of RN.As are shown ai tho right.. A h<ipsin 
cDNA ( I.B kbl wns used ns the radiolalieted probe in this experiment. 



Table I 

SynU'ny tPst of th*' hcfXsin fierw and human chrnmo^onx(*s 
in rocirnt hunuin hybrid clnnf"; 
.*^(tinai ir ccW hybrids werv* smrod for (hf* presence (-t k or <il>senfe 
i — ) orsperifir h\nnnn rlir»»Mi*i>«Mnf-s liv j;ene en/Anir and < vin;;cMeiir 
juudysos (infi for ttie pre.'-ene*' ttr absence f*t licpsin ctxlinu ^^t-ijoent 
bv Southern bhu hvlirifliy.aiitin. 



1 lutni) n 




t*l)rt>int»s<>mt.' 


Asvnt^rnv 

• * 










/ 




I. 


7 








on 


2 


4 






10 


4fi 


* r 


2 




4 








6 


•-> 

i 






4R 






to 




1 7 


4 0 




12 




G 


12 


^ § 


1 


.'J 


8 


2 


12 


40 


8 


o 


S 


4 


6 




9 


6 


10 


•> 


lo 


:is 


10 


fS 


'1 


10 


1 i 


.17 


1 1 


10 


:> 


r» 


Pi 


28 


12 






;> 


10 


;{7 


l.'i 


."> 


JS 


12 






M 


10 


1 


n 


1 1 




1.^ 


i 


H 


0 


12 


47 


Hi 


y 


f) 




1 1 




17 


10 


5 


12 


S 


A9 


IS 


fi 


0 




6 


52 


19 


to 


0 


0 


22 


0 


20 


9 


r» 


7 


r> 


46 


21 


5 


10 


6 


15 




22 


3 


c 




1 1 


4-\ 


X 


10 




16 




57 




19 

Kir:. 1. Distribution of labeled sili^s on chromosome ll> in 
100 normal human metaphase cells from phytohcniajiuluti- 
nin-sltmulaled peripheral blood lymphocytes that wcr*; hy- 
bridised with the hepsin probe. Of In be led sites <>bservi?d on 
chromosome 19, 49 (76.6%) were clustered m. I9ql l-I.'^^.2; the iarfjcst 
cluster of grains was located at 19ql3.1. 

for raising nntibodie.s were preincubntecl with antibodies, in- 
dicating that the staining of the cells is specific (Fig- oH), 
Antibodies raised aj^ainst the synthetic peptide P5 (the car- 
boxyl-terininal region) gavx similar results (data not shown). 
Perineabilized cells with IViton X-lOO did not show any 
significant increase or change in staining (data not shown). 
When antibodies specific for blood coagidation factor IX were 
used or a nti -hepsin antibodies were omitted in control exper- 
iment's, no significant staining of the cells was observed. These 
imnumostaining patterns show that hepsin primarily has its 
catalytic suhunii {carboxyl-half) ihe cell surface. Conse- 
quently, its ami no- terminal portion is likely to l)e facing the 
cytosoi. The tmmunostaining results of cultured cells as well 
as tissues are consistent with this molecular orientation of 
hepsin at the cell membrane. These results also agree well 
with those of the immunoblot analysis which showed hepsin 
to be primarily located in the cell membrane fraction. The 
HAbPl antibody which was raised against the NH^-terniinal 
region of hepsin did not serve to further confirm the results 
because of its unfortAinate low specificity. 

Mitd Proteolysis of HepG2 Cells — To further test the topol- 
og>' of hepsin, HepG2 cells were mildly digested with tr>TJsin 
(100 Mg/ml) or proteinase K (10 ;jg/ml) on ice. The results of 
immunoblot analyses of these protein samples are shown in 



o 

o 

« 

O 
(U 
Q. 
CD 

a. 

=? 
o 

3 



cr 

o 

b 

(3 

cr 
*< 

o 

o 

o 

o 
cr 

CD 

cn 

o 
o 

-si 



16952 



Characterization of Hepsin 



B 





FiC. 5. Fluorescent immiinostainin|{ of HepGii colls. HinH 
A. staining cells with .'intihodtes niiseri figainst the t^atfilyiic riomaiii 
( f-IAhr*M), lUmel staining cells in the presence ot" antigen peptides. 
Kxperimcninl derails are described under **Experinicninl Procedures." 




Flc. 6. Immunohtot anulysiN of plasma- membrane proteins 
prepared from MepC*2 cells with and without mild proteolysis. 

ijcmr Micmhrnne proteins f20 of HepG2 celln treated with 
proteinase K iW) ^\i^Ji\\\ )\ Ixim' 2, niemhrane proteins (20 pj:) of HepM2 
cells ireatod with iryi)sin (100 mi/mV); Uine :i. membrane proteins 
(20 mk) of IIepG2 cells without, protease treat,tneni. Bands a and v 
correspond to the ril- and '2H-kDa hepsin bands in Kig, 1. Band b 
corresponds lo partinlly det^raded hepsin. Antilwdies prepared n^ainst, 
the carboxyl-tcrminnl region (llAbP5) were used in this experimenl. 

Fig. 6. The protein bands (a and c in control lane 3) corre- 
spond to 51 and 28-kDa bands of hepsin in Fig. 1 . When the 
cells were treated with tr>'psin {lane 2), both bands a and c 
were grossly reduced in intensity compared to the nontreated 
control (lane 3). When the cells were very mildly treated with 
proteinase K (1.0 nt^/ml, lane /), both band.s a and c lowered 
their intensities and a new band, 6, appeared, likely derived 
from band a. These results suggest, that limited proteolysis, 
which is mild enough lo maintain cellular integrity and via- 
bility, results in significant degradation of the carboxyl-ter- 
minal portion (the catalytic subunit) of hepsin. This further 
supports the molecular orientation of hepsin with its catalytic 
subunit at the cell surface exposed to the extracellular space. 

DISCUSSION 

The results of our studies demonstrate that hepsin, origi- 
nally identified as a putative membrane-bound protease, is 
present in the cell membranes. We have also characterized its 
molecular size, tissue distribution of expression, and the chro- 
mosomal localization of its gene. 

The size of the mRNA for baboon hepsin is estimated to be 
about 1.85 kb. The human hepsin mUNA produced in HepG2 
has a similar size and agrees well with that predicted from 
the cDNA. The hepsin gene is located at \9i\ \ \ -\:\.2. The 



molecular mapping results and Southern blot analysis of 
human genomic DNA suggest that hepsin has a single copy 
gene.' 

Antibodies raised against synthetic peptides designed to 
various parts of the hepsin sequence predicted from the cDNA 
were successfully used to characterize and analyze its expres- 
sion. Immunoblot analysis of membrane proteins of HepG2 
cells showed two polypeptide bands of 51 (major) and 28 kDa 
(minor) (Fig. 1), whereas BHK cells had only the major band 
(51 kDa). This major band agrees well with the molecular 
sizes predicted from the cDNA and the cell-free translation 
experiment. The smaller minor band of 28 kDa is considered 
to be a degradation product derived from the putative catalytic 
subunit portion of the 51-kDa species. In the reduced condi- 
tion, the apparent size of the 51-kDa band increased slightly 
indicating that this band represents a single pol>^>eptide chain 
which has not undergone any degradation during the mem- 
brane protein extraction procedures employed. We speculate 
that proteol>i:ically activated hepsin, which may be composed 
of two subunits (162 and 255 amino acid residues) linked 
together with a disulfide hon<\ (14), may be efficiently cleared 
from the cell membrane, since we have not seen any signifi- 
cant generation of the expected subunits in the gel electro- 
phoretic analysis emplt>ying reduced conditions. This may 
take place by binding to a specific inhibitor(s) or by acceler- 
ated degradation due to an unknown mechanism. In uitro 
translation assays of the RNA transcripts of hepsin cDNA 
showed a distinct specific band of about 44 kDa that agrees 
reasonably well with the size predicted from the cDNA .se- 
quence. This size also agrees well with that observed for 
cultured cells if we take into account the potential post- 
translational modifications such as glycosylation which may 
increase the molecular mass to the apparent 51 kDa. A poten- 
tial site for the A/-linked carbohydrate chain attachment is 
located at amino acid 112. At the present time, we do not 
know whether or not. this site is glycosylated, or whether any 
O-l inked carbohydrate chains are attached to the mature 
hepsin molecule. 

As shown in HNA blot analysis of baboon tissues (Kig. 3), 
hepsin appears to be ubiquitously expressed in various tissues, 
particularly in the liver, at a high level. Tlie expression of 
hepsin in various tissues suggests that this protease may be 
involved in an essential biological proce.ss(es) in many differ- 
ent cells. In HepG2 cells, hepsin is present in the cell mem- 
brane fraction at high levels^ but not in the cytosol or in 
culture media (Fig. 1). Nuclear and mitochondrial fractions 
also contained a lower amount of hepsin of the same molecular 
weights (data not shown). The results of fiuorescent immuno- 
staining experiments show that hepsin is primarily a cell 
membrane-associated protea.se with the molecular orientation 
of its catalytic subunit (the carboxyl -terminal half) at the cell 
surface. The i>atterns of the fiuorescent immtinostaining of 
various tissues is consistent with this molecular orientation. 
The ob.servation thtit mild protease t reatment of intact HepG2 
cells greatly decreases the intensity of hep.sin bands as tested 
by immunoblot analysis (Fig. 6) further supports the molec- 
ular orientation. When the sequence of 15 amino acid residues 
which immediately Hank the hydrophobic sequence of hepsin 
were compared, the NH:;-terminaI side flanking sequence 
contained the 4 positive net charges while the COOH-terminal 
Hanking side contained no net charges. This agrees well with 
the consensus topological sequence for the type 11 membrane 
proteins derived from well-defined membrane-spanning pro- 
teins (28-30). Furthermore, the immediately flanking residue 
of the NHj-terminal side of the hydrophobic sequence is a 



o 
o 

Q. 
O 

CL 

=T 
O 

3 



O 

b 



O 

ci 

o 
cr 

CD 
—t 

C/l 

ro 

o 
o 
-4 



*' A. I'siiji .'UkI K. Kurnrhi, unpulihsii*.'<l fbnti. 



Characterization ofHepsin 



16953 



positively charged residue, lysine, agreeing well with the con- 
sensus sequence for topology of the type 11 membrane proteins 
recently proposed by Parks and Lamb (31). These observa- 
tions support the premise that the mechanism of intracellular 
transportation of the newly synthesized hepsin is analogous 
to that of other reported membrane-bound proteins. 

Several proteases with a similar cellular localization and 
orientation have been reported (8, 11, 13). Hepsin, however, 
is novel and distinct from each of these proteases reported to 
date. 

Proteases have been shown to be present during cell migra- 
tion (32) and tissue rearrangement (33) involved in morpho- 
genesis, where it has been assumed that they create space for 
cell migration and process extension through an extracellular 
matrix and cell-filled milieu. Their role in cell growth can be 
inferred from their presence, for example, on immature but 
not mature glial cells (34) or the highly developmentally 
regulated appearance of tissue plasminogen activator in ma- 
turing sperm (35). Although the precise biological role(B) of 
hepsin is unknown at the present time, we postulate that 
hepsin also plays an important role(s) in cell growth, probably 
by creating space for growing cells by degrading a specific 
extracellular matrix protein(s) or a protein(s) in the tissue. 
In this regard, it is important to note our recent observation 
that hepsin is expressed at a greatly elevated level in actively 
dividing cells in such tissues as the basal layer of the epidermis 
of developing skin.^ Hepsin may also have a role in other cell 
functions in normal as well as in pathological conditions. In 
our preliminary results, antisense oligonucleotides of hepsin 
show a significant effect on the growth rate as well as on the 
morphology of HepG2 and BHK ceUs in culture, supporting 
the above hypothesis.^ Hepsin may also play an important 
role in the metastasis of tumor tissues like some other mem- 
brane proteases (13); however, this has yet to be tested. 

Determination of the substrate specificity of hepsin is ob- 
viously very important in order to define its precise biological 
role(s). In our preliminary assay, hepsin highly enriched on 
the antibody affinity column showed strong activity towards 
JV-benzoyl-Leu-Ser-Arg-pNA-HCl, but it did not cleave N- 
benzoyl-Glu-Phe-Ser- Arg-pNA • HCl. To this end, efforts to 
isolate hepsin in quantity from cultured cells and tissues is in 
progress. Determination of its concentration in variovis tumor 
tissues is also in progress in our laboratory. 

Acknowledgments — We thank L, H. J. Lin, Rafael Espinosa III, 
Matt Rebentisch, and L, Landa for their excellent technical assist- 
ance. We also thank Dr. Amiya K. Hajra for his help in assaying the 
membrane preparations and Dr. Richard E. Tashian for his critical 
reading of the manuscript. C. Herrerias is also thanked for her 
excellent help in preparing the manuscript. 

REFERENCES 

1. Neurath, H. (1986) Fed. Proc, 44, 2907-2913 

2. Bond, J. S., and Butler. P. E. (1987) Annu, Rev, Biochem. 56, 

333-364 



^ S. O'Shea, A. Tsuji, and K. Kurachi, submitted for publication. 



3. Katunuma, N., and Kominami. E. (eds) (1989) Intracellular Pro- 

teolysis. Mechanisms and Regulations, Proceedings of 7th Inter- 
national Committee on Proteases Meetings, Shimoda, Japan. 
Japan Scientific Societies Press, Tokyo 

4. Billings, P. C, Carew, J. A., Keller -McGandy» C. E., Goldberg, 

A. L., and Kennedy, A. R. (1987) Proc. Natl Acad, ScL U. S. 
A. 84, 4801-4805 

5. Scott, G. K., Seow, H. F., and Tse, C. A. (1989) Biochim. Biophys. 

Acta 1010, 160-165 

6. Scott, G. K., and Seow, H. F. (1986) Exp. CeU Res. 158, 41-52 

7. Fraser. J. D., and Scott, G- K. (1984) MoL Immunol 21, 311-320 

8. Steven, F. S., and Al-Ahmad, R. K. (1983) Eur. J. Biochem. 130, 

335-339 

9. Steven, F. S., Griffin, M. M,, and A!-Ahmad, R. K, (1986) J. 

Chromatogr. 376, 211-219 

10. Steven. F. S., and Griffin, M. M. (1988) Biol Chem. Hoppe-Seyler 

369, (suppl.) 137-143 

11. Tanaka, K., Nakamura, T., and Ichihara, A. (1986) J. Biol Chem. 

261, 2610-2615 

12. Satoh, M., Yukosawa, H., and Ishu, S.-I. (1988) J. Biochem, 

(Tokyo) 103, 493-498 

13. Aoyama, A., and Chen, W. T. (1990) Proc, Natl Acad. ScL U. S. 

A. 87, 829&-8300 

14. Leytus, S. P., Loeb, K. R,, Hagen, F. S., Kurachi, K., and Davie, 

E. W. (1988) Biochemistry 27, 1067-1074 

15. Kurachi, K., Tsuji, A., and 0*Shea. K. S. (1989) Intracellular 

Proteolysis, Mechanisms and Regulations, Proceedings of 7th 
International Committee on Proteases Meetings, Shimoda, Ja- 
pan. (Katunuma, N., and Kominami, E., eds) pp. 144-149, 
Japan Scientific Societies Press, Tokyo 

16. Reichlin, M. (1980) Methods Enzymol 70(A), 159-165 

17. Belsham, G. J., Denton, R. M., and Tanner, M. J. A. (1980) 

Biochem. J. 192,457-467 

18. Windell, C. C., and Unkeless, J. C. (1968) Proc. Natl Acad. Sci 

U. S. A. 61, 1050-1057 

19. Sottocasa, G. L., Kuylenstiema, B., Ernster, L., and Bergstrand, 

A. (1967) J. CeU Biol 32, 415-438 

20. Fleming, P. J., and Hajra, A. K. (1977) J. Biol Chem. 252, 1663- 

1672 

21. Blobel, G., and Potter, V. R. (1966) Science 154, 1662-1666 

22. Bradford, M. M. (1976) Anal Biochem. 72, 248-254 

23. Towbin, H., Staehelin, T., and Gordon, J. (1979) Proc. Natl Acad. 

Sci. U. S. A. 76, 4350-4354 

24. Chomczynski, P., and Sacchi, N. (1987) Anal Biochem. 162, 

156-159 

25. Kusumoto, H., Hirosawa, S,, Salier, J. P., Hagen, F. S., and 

Kurachi. K. (1988) Proc. Natl Acad, Sci. U, S. A. 86, 7307- 
7311 

26. Le Beau, M. M., Pettenati, M. J., Lemons, R. S., Diaz, M. O., 

Westbrook, 0. A., Larson, R. A., Sherr, C. J., and Rowley, J. 
D. (1988) Cold Spring Harbor Symp. Quant. Biol 51, 899-909 

27. Le Beau, M. M., Westbrook, C. A., Diaz, M. O., and Rowley, J. 

D. (1984) Nature 312, 70-71 

28. Hartmann, E., Rapoport, T. A,, and Lodish, H. F. (1989) Proc. 

Natl Acad Sci. U. S. A. 86, 5786-5790 

29. von Heijne, G. (1986) EMBOJ. 5, 3021-3027 

30. Boyd, D., and Beckwith, J. (1990) CeU 62, 1031-1033 

31. Parks, G. D., and Lamb, R. A. (1991) CeU 64, 777-787 

32. Valinsky, J. E., and LeDouarin, N. M. (1985) EMBOJ. 4, 1403- 

1406 

33. Saksela, O., and Rifkin, D. B. (1988) Annu, Rev. Cell Biol 4, 93- 

126 

34. Kalderon, N., and Williams, C. A. (1986) Dev. Brain Res. 25, 1- 

9 

35. Huarte, J., Belin, D., Bosco, D., Sappino, A.-P., and Vassalli, J. 

D. (1987) J. CeU Biol 104, 1281-1289 





Exhibit 3 8 



TlIE JOITRNAL OP BIOLOGICAL ClIEMISTRY 

O 1997 by The American Society for Biochemistry and Molecular Biology. Inc. 



Vol. 272, No. 50, Issue of December 12, pp. 31315-31320, 1997 

Printed in U.SA. 



Identification and Cloning of the Membrane-associated Serine 
Protease, Hepsin, from Mouse Preimplantation Embryos* 

(Received for publication, August 5, 1997, and in revised form, October 2, 1997) 

Thien-Khai H. Vu$§, Rose W. Liut, Carol J. HaaksmaH, James J. Tomasekll, and Eric W. Howard^ 

From the Departments of%Pathology and Anatomical Sciences, University of Oklahoma Health Sciences Center, 
Oklahoma City, Oklahoma 73190 and the ^Center for Assisted Reproductive Technology, Columbia-Presbyterian Hospital, 
Oklahoma City, Oklahoma 73104 



Previous studies have suggested the existence of a 
membrane-associated serine protease expressed by 
mammalian preimplantation embryos. In this study, we 
have identified hepsin, a type II transmembrane serine 
protease, in early mouse blastocysts. Mouse hepsin was 
highly homologous to the previously identified human 
and rat cDNAs, Two isoforms, differing in their cytoplas- 
mic domains, were detected. The tissue distribution of 
mouse hepsin was similar to that seen in humans, with 
prominent expression in liver and kidney. In mouse em- 
bryos, hepsin expression was observed in the two-cell 
stage, reached a maximal level at the early blastocyst 
stage, and decreased subsequent to blastocyst hatching. 
Expression of a soluble form of hepsin revealed its abil- 
ity to autoactivate in a concentration-dependent man- 
ner. Catalytically inactive soluble hepsin was unable to 
autoactivate. These results suggest that hepsin may be 
the first serine protease expressed during mammalian 
development, making its ability to autoactivate critical 
to its function. 



Embryonic development is marked by a series of cellular 
divisions and morphogenetic changes (1). These processes are 
mediated by the complex expression and interplay of different 
sets of genes, some of which are derived from maternally ex- 
pressed genes stored as mRNAs in the oocj^es. It is generally 
accepted that zygotic gene expression begins at the embryonic 
two-cell stage (2). These newly expressed zygotic genes comple- 
ment the maternally expressed genes to mediate early preim- 
plantation development. Numerous studies have suggested the 
involvement of a variety of proteases during development. 
Members of the astacin family of metalloproteases are involved 
in hatching in both invertebrates and vertebrates (3-6), pat- 
tern and tentacle cell formation in the hydra by HMPl (7), 
neuroblast migration in Caenorhabditis elegans by hch-1 (3), 
dorsal/ventral patterning in Drosophila by Tolloid (8), and bio- 
mineralization and bone/cartilage formation in mammals by 
BMP-1 (9, 10). Interestingly, both Tolloid and BMP-1 can phys- 
ically interact with transforming growth factor-/3 (8, 9), and 
this association is essential for normal development, perhaps to 
activate latent transforming growth factor-^ complexes. In ad- 



* This work was supported by the Oklahoma Center for the Advance- 
ment of Science and Technology Grant HR4-098. The costs of publica- 
tion of this article were defrayed in part by the payment of page 
charges. This article must therefore be hereby marked '*advertisement" 
in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 
The nucleotide sequence(s) reported in this paper has been submitted to 
the GenBank'^^U EBl Data Bank with accession number(s) AF030065. 

§ To whom correspondence should be addressed: Dept. of Pathology, 
Biomedical Sciences Bldg., Rm. 434, University of Oklahoma, Health 
Sciences Center, 940 Stanton L. Young Blvd., Oklahoma City, OK 
73190. Tel.: 405-271-1483; Fax: 405-271-8774. 



dition, BMP-1 has been shown to be the procollagen C-endopep- 
tidase (EC 3.4.24.19) required for the processing of type I, II, 
and III procollagen to fibrillar coUagens to yield the major 
fibrous components of vertebrate extracellular matrix (11, 12). 

Proteases have also been shown to play essential roles in cell 
differentiation. Recently, new members of the adamalysin/re- 
prolysin metalloprotease have been described and were shown 
to have a direct role in a number of developmental processes. 
Fertilin-« and -)3, the first members of this family, have been 
shown to have essential roles in sperm -egg fusion during fer- 
tilization (13-15). The recent discovery of meltrin-a, a fertilin- 
related member of the adamalysin/reprolysin metalloproteases 
important for myoblast fusion during skeletal muscle develop- 
ment, suggests that there may be a common mechanism in 
gamete and myoblast fusion (16). Astacin-like proteases of the 
Tolloid/BMP-1 family play important roles in cell differentia- 
tion and morphogenesis in animal embryos ranging from the 
hydra and sea urchins to mammals (17). 

Serine proteases have also been implicated in development, 
which is exemplified by genetic studies of the products of the 
Drosophila gene stubhle-stubbloid^ which is essential for epi- 
thelial morphogenesis of imaginal discs of Drosophila (18). 
Mutations in this gene affects imaginal disc formation and 
affect the organization of microfilament bundles, leading up to 
defects in bristle, leg, and wing morphogenesis. Also in Dro- 
sophila^ the maternally transcribed product of the caster gene, 
a trypsin-like serine protease, is essential for the establish- 
ment of a normal dorsal-ventral pattern in the embryos (19). Of 
note, perturbing quantitatively the level of Easter protease 
activity in Drosophila as a result of dominant mutations can 
disrupt the dorsal-ventral axis, leading to ventralizing and 
lateralizing phenotypes (20). The Drosophila trypsin-like en- 
zymes easter and snake are part of a cascade of zymogen 
activation leading up to the conversion of the ligand-precursor, 
spatzle to its active form (21—23), Active spatzle then activates 
its receptor Toll to affect specification of dorsal and lateral cell 
fates (24, 25). 

While evidence exists that one or more serine proteases exist 
in mammalian preimplantation embryos, the identity of these 
enzymes has remained elusive. One of the earliest events in 
embryogenesis thought to require a protease is blastocyst 
hatching. This involves the proteolysis of the zona pellucida, an 
event critical for subsequent uterine implantation of the em- 
bryo. Studies have suggested that a single, membrane-associ- 
ated serine protease is expressed by hatching blastocysts (26). 
In this study, we identify hepsin, a serine protease containing 
a transmembrane domain, as a serine protease expressed by 
mouse embryos at the two-cell stage through the early blasto- 
cyst stage. In addition, we demonstrate that a soluble form of 
hepsin lacking the transmembrane domain undergoes autoac- 
tivation, suggesting a mechanism by which hepsin becomes 
proteolytically activated in the absence of other proteases. 



This paper is available on line at http://www.jbc.org 



31315 



31316 



Hepsin in Preimplantation Embryos 



* CGGAGGAGGGGGCTCGGCAGGCCCCACGCT 

3 1 OCTGGCTCCTGCTGCCACCCTTGCCTTCaSGGCTGTCCCXrrGCTC^^ 
118 CCCAAACCTGCACCATCTCCCGCGAACCCCAGGOTTCCCCCCCACCCCAAavCGTCAACCTGGCAATCA 





H 


A 


K 


B 


G 


G 


R 


T 


A 


A 


C 


C 


S 


R 


P 


K 




A 


A 


L 


z 


V 


22 


205 


ATG 


GCG 


AAG 


GAG 


GGT 


GGC 


CGG 


ACT 


GCA 


GCA 


TGC 


TGC 


TCC 


AGA 


CCC 


AAG 


GTG 


GCA 


GCT 


CTC 


ATT 


GTG 




G_ 




L 


L 


F 


— 


T 


Q 


I 


Q 


-A 


..A. 


_s . 


H 


h 


I 


Y 


X 


I 


L 


L 


0 


44 


271 




ACC 


CTG 


CTG 










ATT 

A X X 










x\i(j 




ATT 








CTA 


CTG 


CAG 


337 


s 

AGT 


D 

GAG 


Q 
CAG 


E 
GAG 


p 

CCA 


L> 

\— xu 


Y 

Tie 


Q 


V 


Q 

r*AO 


L 


s 

Af*T 


p 


G 


o 


s 


R 


L 

cn* 


A 

GCA 


V 
GTG 


L 
TTG 


0 
GAC 


66 


403 


K 
AAG 


T 
ACG 


E 
GAG 


G 
GGT 


T 


W 

fan 


R 


L 

CTA 


cm 


C 


S 


s 


R 


s 


N 

A ft 1* 


A 


R 


V 


A 

GCA 


G 
GOG 


L 
CTC 


G 
CCC 


B8 


469 


c 

TCI* 


E 
GAG 


E 
GAG 


M 

ATG 


G 
GGC 


F 

' * ^ 


L 
CTC 


R 


A 

GCT 


L 

V X w 


A 

CCC 


H 

CAC 


S 

TCG 


E 


L 


D 
nAT 


V 


R 


T 


A 


G 

GGC 


A 

GCC 


110 


535 


N 
AAC 


G 
CCC 


T 
ACA 


S 
TCG 


G 
GGC 


F 

* 


F 


C 
TGC 


V 
GTG 


D 

GaC 


E 

RAG 


G 

GGC 


C 

GGA 


L 

W Xw 


? 

CCT 
W^ X 


L 

CTfl 


A 
GCT 


Q 

f*Aft 

V. AO 


R 


L 
I i\i 


L 

CTO 


D 

uAT 


132 


60L 


V 

GTC 


I 

ATC 


S 
TCT 


V 
GTA 


C 
TGT 


D 
GAC 


C 
TOT 


P 

CCT 


R 
AGA 


G 
(iGC 


R 

CGA 


F 

TTC 


L 

« 1 W 
^ X \J 


T 

ACT 


A 

GCC 


T 


c 


Q 

^AA 


o 


C 
TGT 


G 


R 


154 


667 


R 
AGG 


K 
AAG 


L 

CTG 


P 

CCG 


V 

GTG 


D 

GAC 


R* 

CGC 


I 

ATT 


V 

GTG 


o 

GGG 


G 

CCC 


Q 

CAC 


D 

GAC 


s 

AGC 


s 

AGT 


L 


G 

GGA 


R 

AGG 


w 


P 


W 


Q 


176 


733 


V 

GTC 


S 
AGC 


L 

CTG 


R 
CGT 
* 


y 

TAT 


D 

GAT 


G 
GGG 


T 
ACC 


H 
CAC 


L 
CTC 


c 

*U X 


G 

GGG 


G 

GGG 


s 

TCC 


L 


L 


5 

TCT 


G 

GGG 


D 


W 


V 


L 


198 


799 


T 
ACT 


A 

WW — 


A 

GCA 


H 
CAT 


C 


F 

4 nl A 


P 
CCA 


E 

HAR 


R 


N 

A&C 


R 


V 

GTC 


L 


s 

TCT 


R 

CGG 


w 

TGfS 


R 

CGI 


V 


F 


A 

tfl 

LrU 1 


G 

GGT 


A 

GCT 


220 


S65 

O V ^ 


V 


A 

GCC 


R 

CGG 


T 
ACC 


s 

TCA 


P 

CCC 


H 
CAT* 


A 

ftCT 


V 


Q 

HA A 


L 


G 


V 


0 

• 


A 


V 
GTG 


I 


Y 


H 


G 


G 


Y 
TAC 


242 


91 1 

-J *J 4^ 


L 


P 

x# ^ v» 


F 

4- *■ *■ 


R 
CGA 


D 

Kac 


p 

CCT 


T 


I 


D 
Rac 


B 


N 


s 


N 


D 


I 

A. i J 


A 


L 

i i V 


V 


H 


L 

i *liH * 


S 
TCT 


S 
PAX. 


264 


997 


s 

TCC 


L 

CTG 


P 

CCT 


L 

CTC 


T 
ACA 


fi 
CAA 


Y 
TAC 


1 

ATC 


0 


P 

CCA 


V 

RTG 


c 


L 


P 


A 

fir*T 


A 


G 


Q 


A 


L 
^ I u 


V 
GTO 


0 


286 


1063 


G 
CGC 


K 
AAG 


V 

GTC 


C 
TGT 


T 
ACT 


V 
CTG 


T 
ACC 


G 
GGC 


w 

TGG 


G 
GGT 


a 

AAC 


T 
ACA 


0 
CAG 


F 

nc 


Y 
TAT 


G 
CGC 


0 
CAA 


0 
CAG 


A 
GCT 


M 

ATG 


V 
GTG 


L 
CTC 


308 


1129 


C 
CAA 


E 
GAG 


A 

CCC 


R 
CCG 


V 
CTT 


P 
CCC 


I 

ATC 


I 

ATA 


S 
ACC 


N 
AAC 


E 
GAA 


V 
GTT 


C 
TGC 


N 
AAC 


5 
AGC 


P 
CCC 


D 
GAC 


F 
TTC 


Y 
TAC 


G 
GGO 


N 
AAT 


0 
CAG 
« 


33 0 


1195 


I 

ATC 


K 
AAG 


p 

ccc 


K 
AAG 


H 

ATG 


P 
TTC 


C 
TGT 


A 
GCT 


G 
GGC 


Y 
TAT 


P 
CCT 


E 
GAG 


O 
GGT 


G 
GGC 


I 

ATT 




A 

GCG 


C 
TGC 


0 
CAG 


G 
GGC 


D 

GAC 


S 
AGT 


352 


1261 


G 
GOA 


G 
GOC 


p 
ccc 


F 

TTT 


V 

GTG 


C 
TGT 


E 
GAA 


D 

GAC 


S 
ACC 


I 

ATC 


S 
TCT 


G 
GGG 


T 
ACA 


S 
TCA 


R 
AGG 


w 

TGG 


R 
CGG 


L 
CTA 


C 
TGT 


G 
GGC 


I 

ATT 


V 
GTA 


374 


1327 


S 
AGC 


W 

TGG 


G 

GGT 


T 

ACC 


G 

GGC 


C 
TGT 


A 

GCT 


I> 
TTG 


A 

GCC 


R 

CCG 


K 
AAG 


P 

CCA 


G 
OGA 


V 
GTG 


Y 
TAC 


T 
ACC 


K 
AAA 


V 
GTC 


T 
ACT 


D 
GAC 


F 
TTC 


R 
CGG 


396 


1393 


B 
GAG 


W 

TGG 


I 
ATC 


F 
TTC 


K 
AAG 


A 
GCC 


I 

ATA 


K 
AAG 


T 
ACT 


H 
CAC 


S 
TCC 


E 
GAA 


A 
GCC 


S 
AGT 


O 
CGC 


M 
ATG 


V 
GTG 


T 
ACT 


Q 
CAG 


P 
CCC 


scop 
TGA TCC 


416 



14 59 CGCCTCATCTCGCTGCTCCGTGCTGCACTAGCATCCAGAGTCAGAGTTGGTCTGGTGGCTCCAGCCCCACGTGGTAGGCTCCACACT 

1546 GGGCCTCACATGCAATGGTTTCCTGCTCAGATCCAGTCCACGGGTCCAAGGATGCTGGATCCAA6GACTTCTCTTCCA 

1633 GCCCACTCAATCCCAGGGCCATTGGCCTCACCCTCCCACCCCATGTAAATATTACTCTGTCCTCTGGGGGGCGCTCTAGGGAGCCCC 

1720 TTGTOC AG ATG CTCTTTAAATAATAAAGGTGGTTTTG ATTAATGGG ACAAAAAAAAAAAAAA 



Fig. 1. Nucleotide and predicted amino acid sequences of mouse hepsin. The internal signal peptide sequence serving as a transmem- 
brane iTM) domain is under Lined. The zymogen activation cleavage site (aiTow), catalytic triad residues {asterisks), and Asp^"*^ (circle) are depicted. 



EXPERIMENTAL PROCEDURES 

Collection and Culture of Mouse Preimplantation Embryos — Experi- 
ments utilizing preimplantation embryos were performed with cultured 
two-cell stage embi*yos, which were obtained from B6C3F1 prepubes- 
cent female mice (Charles Rivers Lab) weighing 10-13 g. Mice were 
injected intraperitoneally with 5 lU of pregnant mare's serum gonado- 
tropin (Sigma) followed 48 h later with 5 lU of human chorionic gona- 
dotropin (Sigma). Subsequently, a single female was paired with a 
single male overnight, and females were checked for vaginal plugs the 
following day (day 1). On day 2, mice were dissected to obtain the 
oviducts, which were bathed in sperm washing medium (Irvine Scien- 
tific) and dissected to release the two-cell embryos. About 40-50 two- 
cell embryos were pooled and cultured under oil at 37 °C in a humidified 
atmosphere of 5% CO^ in air in 50-ftl droplets of human tubular fluid 
(Irvine Scientific) plus 0.5% human serum albumin (Ii*vine Scientific). 
Cultures were maintained for 4-5 days or until expanded blastocysts 
began to hatch. 

RNA Isolation and First-strand cDNA Synthesis — Total RNA was 
isolated fi:"om 100-200 hatching blastocysts (embryonic day 4.5), accord- 
ing to the method of Chomczynski and Sacchi (27). The total amount of 
RNA obtained was then used in the first-strand cDNA synthesis reac- 
tion using Superscript reverse transcriptase (Life Technologies, Inc.) 
and oligo(dT) as primers. The reaction was incubated at 42 for 1 h. 
Subsequently, RNase H (Life Technologies, Inc.) was added and the 
reaction was incubated at 37 *C for 20 min to remove the RNA 
template. 

PCR^ Amplification, Cloning, and Sequencing of Mouse Hepsin — To 



^ The abbreviations used are: PGR, polymerase chain reaction; bp, 
base pair(s); kb, kilobase pair(s); RT, reverse transcriptase; PAGE, 
polymerase chain reaction; fVni, factor VII; fVlIa, activated factor VII; 
pBS, pBluescript. 



identify the serine protease involved in mouse blastocyst hatching, 
degenerate oligonucleotides, 5'-TGCTCTAGATGG<A/G)TINTI(A/T)(G/ 
C)IGCIGCICA-3' and 5'-CCGGAATTCA(A/G)IGGI(G/C)(ACT)ICCI(G/ 
C)(AyT)(A/G)TCICC-3' (Molecular Biology Resource Facility, OUHSC), 
based on two conserved regions of known serine proteases, were used to 
amplify a 500-bp DNA fragment, encoding part of the protease catalytic 
domain, from hatching blastocyst RNA. Aliquots of first-strand cDNA 
were incubated in the presence of 0.1 /am of each 5'- and 3 '-primers, 100 
fxM dNTP, 1 X PGR buffer, and 2.5 units/100 fil of AmpliTaq DNA 
polymerase (Perkin-Elmer). The reactions were cycled 40 times through 
the following steps: 30 s at 94 *C, 30 s at 55 'C, and 1 min at 72 *C in 
a Perkin-Elmer DNA thermocycler model 2800. DNA fragments of the 
correct size ('-500 bp) were purified from agarose gels using GeneClean 
II (BIO 101 Inc., Vista, CA). The purified fragments were ligated into 
pBS-SK-i- (Stratagene) using T4 DNA ligase (New England Biolabs). 
Double-stranded DNA was sequenced using T3 and T7 primers and the 
Sequenase Version 1 kit (U. S. Biochemical Corp./Amersham Life Sci- 
ence). Sequences of cloned PCR fragments were compared with DNA 
sequences compiled in data bases. 

A full-length cDNA of mouse hepsin was subsequently cloned by 
screening a mouse liver cDNA library (Stratagene), using the manufac- 
turer's instruction. ''^P-Labeled DNA probes were generated using the 
Prime-It II random primer labeling kit (Stratagene) and the 500-bp 
cloned PCR fragment described above as a template. A 1.8-kb cDNA 
obtained was sequenced as described above using both pBluescript and 
internal primers. 

Construction and Expression of Soluble Hepsin and Catalytically 
Inactive Hepsin — The method of site-directed mutagenesis as described 
previously (28, 29) was used to introduce a Stul restriction site at the 
end of the coding sequence of the transmembrane domain of hepsin 
using the oligonucleotide, 5'-GTGACCATCCTAAG<jCCTAGTGAC- 
CAGGAGCC-3', which replaced nucleotides 331-336 with a Stul site. 



Hepsin in Preimplantation Embryos 



31317 



Mouse 

Rat 

Human 

Mouse 

Rat 

Human 



MAKEGGRTAACCSR PK 
AP 

MAQ VP 



VAALIVGTUjFLTGIGAASWAIVTILL 

TV F — G IL- 

TA L — A AV- 



QSDQE PLYQVQLS PC 

R Q__L_pG 

R p — v-SA 



DSRIAVLDKTEGTWRLLCSSRSNARVAGLGCEEMGFLRALAHSELDVRTAGANGTSGFFC 

-S — L-L -G A 

-A— M-F S T 



58 
58 
59 

118 
118 
119 



Mouse 

Rat 

Human 



VDEGGLPLAQRLLDVI SVCDCPRGRFLTATCQDCGRRKLPVDRI VGGQDSSLGRWPWQVS 178 

G — LA D T-T Q-S 178 

R — HT E A-I R-T 179 



Mouse 

Rat 

Human 



LRYDGTHLCGGSLLSGIHWLTAAHCFPERNRVIiSRWRVFAGAVARTS PHAVQLGVQAV:! Y 238 

T RT AV 1- 238 

A QA GL V- 239 



Mouse 

Rat 

Human 

Mouse 

Rat 

Human 



HGGYLPFRDPTIDENSNDIALVHLSSSLPLTEYIQPVCLPAAGQALVDGKVCTVTGWGNT 

TID S V 

NSE P 1 



QF YGQQAMVLQEARVPII SNEVCNS PDFYGNQ I KPKMFCAGY PEGGI 
_y G D GA 



ACQGDSGCPF 

H- 

P- 



298 
298 
299 

356 
356 
3 57 



Mouse 

Rat 

Human 



VCEDSISGTSRWRLCGIVSWGTGCALARKPGVYTKVTDFREWIFKAIKTHSEASGMVTQP* 417 

R — G-S R 1 Q T P* 417 

S — R-P Q S Q S L* 418 



Fig, 2. Sequence alignment of mouse, rat, and human hepsin. Deduced amino acid sequences of mouse, rat, and human hepsin are shown. 
Amino acid identity is indicated by a dash. The conserved TM domain and Asp^**® are boxed. 



This Stul site and the Xbal site at the 3' end of the cDNA in pBS-SK+ 
were used to excise a 1.1-kb DNA fragment and cloned into the same 
sites in the RSV-PL4 expression vector (30). This construct included a 
transferrin signal peptide, followed by an amino-terminal epitope tag 
recognized by HPC4, a calcium-dependent monoclonal antibody (31). 
The soluble hepsin expressed using this vector had a new amino- 
terminal of Glu-Asp-Gln-Val-Asp PVo-Arg-Leu-Ile-Asp-Gly-Lys-Ile-Glu- 
Gly-Ser-Pro, followed by the wild-type hepsin sequence from Ser*''. The 
non-functional S348A soluble hepsin mutant, which replaced the active 
site serine with an alanine, was constructed similarly with the addi- 
tional use of the oligonucleotide, 5'-TGCCAGGGCGACGCTGGGGGC- 
CCCTTTGTG-3'. The resulting constructs were transfected into human 
293 epithelial cells using LipofectAMINE (Life Technologies, Inc.) as 
suggested by the manufacturer. High expressing clones were selected 
using 400 fxg/ml G418 (Life Technologies, Inc.). The accuracy of the 
constructs were confirmed by DNA sequencing. The recombinant 
epitope- tagged protein was purified from conditioned medium by affin- 
ity chromatography using HPC4-linked Affi-Gel 10 and was eluted with 
EDTA. 

Assay of Soluble Hepsin Activity — Soluble hepsin amidolytic activity 
was assayed using the chromogenic substrate Spectrozyme PCa (H-d- 
ICbol-Lys-FVo-Arg-pNA; American Diagnostica) at a final concentration 
of 0.2 niM. The absorbance at 405 nm was monitored over 10 min using 
a V,„„jj microplate reader (Molecular Devices) to determine the rate of 
chromogenic substrate hydrolysis {CiA^^rJrmn). Inhibitory dose-response 
curves were generated by preincubating the enzyme with specific in- 
hibitors at different concentrations for 30 min at ambient temperature 
prior to the addition of the substrate. 

SemiquaniilaUve RT-PCR and Southern Blot Analysis — RT-PCR- 
linked Southern blot analysis to augment sensitivity of detection was 
utilized to investigate the temporal expression of hepsin in mouse 
preimplantation embryos. cDNAs from various stages of development 
were prepared from 40 to 50 embryos as described above. Oocytes were 
prepared from unmated females and treated with hyaluronidase (Sig- 
ma) to remove cumulus cells before proceeding to the total RNA isola- 
tion and cDNA synthesis as above. PGR was performed essentially as 
above with the mouse hepsin primers, 5'-ATCCAGCCAGTGTGTCTC- 
CCTG-3' and 5'-TCAGGGCTGAGTCACCATGCCAC-3', but with only 
15 cycles. Similar PGR reactions using, /3-actin primers (a gift; from Jeff 
Gimble, Department of Surgery, University of Oklahoma Health Sci- 
ences Center), were used as positive controls. Southern blot analysis of 
the PGR products was performed as described previously (30) using 
^'^P-labeled random-primed DNA probes generated from the same am- 
plified DNA regions as templates. 

Northern Blot Analysis — Total RNA was isolated from cells according 
to published methods (27). UNA was transferred to MSI-NT nylon 
membranes by capillary action, then cross-linked to membranes with 
UV light. Membranes were incubated for 1 h at 60 "C with prehybrid- 
ization buffer (500 him NaPO^, pH 7.4. 7% SDS, 1 mM EDTA). Mem- 
branes were then hybridized overnight in prchybridization buffer plus 
labeled cDNA probe at 60 'C. Probes were ^^P-labeled by random prim- 




FlG. 3. Temporal expression of hepsin in mouse preimplanta- 
tion embryos. Total RNA fi:om mouse embryos was isolated, then 
analyzed for hepsin mRNA expression by Southern blot-linked-RT-PCR 
analysis in = 3). p-Actin was used as a control. 

ing using a Prime-it II kit (Stratagene), then separated from unincor- 
porated label using ProbeQuant G-50 Micro columns (Pharmacia Bio- 
tech). Following three low stringency washes (15 min in 40 mM NaPO^, 
pH 7.2, 5% SDS, 1 mM EDTA, 0.5% bovine serum albumin at room 
temperature), and two high stringency washes (15 min with 40 mM 
NaP04, pH 7.2, 1% SDS, 1 mM EDTA at 60 **C), and one 30-min high- 
stringency wash, membranes were exposed to x-ray film adjacent to an 
enhancing screen. 

RESULTS 

Strategy for the Identification and Cloning of an Embryonic 
Serine Protease — A prior study using a radioiodinated active 
site chloromethyl ketone probe and SDS-PAGE detected a sin- 
gle serine protease of = 74,000 in mouse blastocyst lysates 
(26). Using RT-PCR and degenerate oligonucleotides based on 
conserved regions in the catalytic domain of serine proteases, 
we amplified and subcloned a 0.5-kb cDNA fragment encoding 
the putative mouse hatching enzyme from hatching blastocysts 
mRNAs. Ten separate clones were sequenced and found to be 
identical. Data base searches showed that the deduced amino 
acid sequence was simUar to that of human hepsin, a trypsin- 
like serine protease previously cloned from a Uver library (33). 
A full-length mouse hepsin cDNA (Fig. 1) was obtained after 



31318 



Hepsin in Preimplantation Embryos 



screening a mouse liver library using the amplified DNA frag- 
ment as a probe. Hepsin is a type II transmembrane protein 
with an extracellular carboxyl- terminal catalytic domain (33, 
34). Based on the predicted amino acid sequence homology with 
other related serine proteases, hepsin is likely to be synthe- 
sized as a single chain zymogen that requires cleavage of the 
Arg'*^^-Ile^^^ bond to generate the mature, disulfide-linked two- 
chain form. In addition to the catalytic triad residues and 
Asp^"*^, which is important for trypsin-like specificity, the 
transmembrane and short cjrtoplasmic domains of hepsin are 
all conserved among mouse, rat, and human hepsin (Fig. 2). 
The significance of the transmembrane domain remains to be 
determined. 

Temporal Expression of Hepsin in Preimplantion Em- 
bryos — To determine if the temporal expression of hepsin was 
consistent with that of a hatching enzyme, we performed semi- 
quantitative RT-PCR-linked Southern blotting to indirectly de- 
termine the time and level of hepsin message in oocytes and in 
several stages of preimplantation development. Hepsin tran- 
scription was biphasic, beginning at the 2-cell stage, absent at 
the 8-cell stage, and peaking at the early blastocyst stage prior 
to hatching (Fig. 3). There was no detectable expression in 
oocytes, and, subsequent to embryo hatching, the level of ex- 
pression clearly diminished (Fig, 3). 

Tissue Expression and Multiple Hepsin mRNAs — Human 
hepsin was previously shown to be expressed primarily in liver 
and kidney, and mouse hepsin was similarly distributed (Fig. 
4). Unlike human hepsin, mouse hepsin had two alternative 
forms detected by Northern blotting, migrating at 1.8 and 1.9 
kb. To characterize the differences in the two hepsin mRNAs, 
we performed RT-PCR analysis using total RNA samples iso- 
lated from mouse liver and kidney. Several oligonucleotide 
primers spanning the hepsin cDNA sequence were utilized, as 
shown in Fig. 5. PGR analysis revealed that an insert in the 



^^^^ <S^^ ^ ^ 




.•.'y^'^.-^p ^^^^^^ 




2QS 



Fig. 4. Tissue distribution of mouse hepsin expression. Total 
RNA (20 ^g/lane) from several adult rat tissues was analyzed for hepsin 
expression by Northern blots hybridized with a cDNA consisting of the 
entire hepsin coding region. Two hybridizing species highly detected in 
liver and kidney correspond to mRNAs of approximately 1.8 and 1.9 kb 
in size. 



5 '-end of the coding sequence distinguished the 1.9-kb message 
from the 1.8-kb message. DNA sequencing revealed an addi- 
tional 60-bp sequence coding for 20 amino acids within the 
c3^oplasmic domain of 1.9-kb hepsin cDNA (Fig. 6). This se- 
quence has not been demonstrated in human hepsin. 

Expression and Autoactivation of Soluble Hepsin — Because 
hepsin is a type II transmembrane serine protease, we wanted 
to address the possibility that a soluble form of the enz3Tne 
could be expressed and used to elucidate hepsin's enzymatic 
properties. We developed an expression construct by site-di- 
rected mutagenesis that encoded for a zymogen form of hepsin 
lacking its transmembrane and cytoplasmic domains (soluble 
hepsin), and stably expressed it in human 293 epiihehal cells. 
Soluble hepsin was expected to be expressed as a single-chain 
zymogen which could be activated proteolytically to a disulfide- 
linked two-chain form, consisting of a 12-kDa Ught chain and 
3 1-kDa heavy chain. The intact precursor as well as proteol3i^i- 
caUy activated species would be expected to migrate with a 
- 43,000 on SDS-PAGE gels. Surprisingly, upon elution, solu- 
ble hepsin was spontaneously activated from a single-chain 
zymogen to the active disulfide-linked two-chain form (Fig. 7, 
WT lanes, and data not shown); this activation was not detected 
in the conditioned medium not subjected to purification (Fig. 7, 



5* 



3' 



B 



Mr 




Liver 



Kidney 



Fig. 5. Localization of the region of nucleotide insertion in the 
1.9-kb hepsin message. Total RNA from both mouse kidney and liver 
were subjected to RT-PCR analysis using different primers sets (each 
primer is denoted by a letter from A-E) to localize the region of nucle- 
otide differences between the 1.8- and 1.9-kb hepsin mRNAs. The po- 
sitions of the primers (arrows) are indicated along the 5'- to 3' -nucleo- 
tide sequence as represented by a horizontal bar above the gel image. 
The position of the nucleotide insertion is also marked. PGR products 
were separated by 1% agarose electrophoresis and stained with 
ethidium bromide. Pinmcr set A/B detected two dilTerent bands due to 
the 60-bp insertion in the coding region for the cytoplamic domain of 
hepsin. Primers were as follows: A, 5'-TGGGAATCATTAACAA- 
GAGTCCCTGAC-3'; B, 5'-AGTCAGGAATCGGCCTCTAGG-3'; C, 5'- 
AGGAAGCTGCCGGTGGACCGCATTGTG-3'; D: 5'-ATCCAGCCAGT- 
GTGTCTCCaTG-3'; E, 5'-TCAGGGCTGAGTCACCATGCCAC-3'. 



DESFGAHRaaSTCSRQPOROO 
GAT GAG GAA CCT GGO GOT CAC AGA GGA GGT TCC ACT TGT TCA AGA CCC CAA CCT AAG GOT GGC 



MAKE 
ATG GCG AAG GAG 



RTAACCSRPK 
CGO ACT OCA OCA TGC TOC TCC AOA CCC AAG 



Fig. 6. Alternative cj^oplasmic domains in the two hepsin mRNAs. Amino acid and cDNA sequence of the hepsin cytoplasmic domain, 
with the inserted sequence within the 1.9-kb form shown above the 1.8-kb form of hepsin. 



Hepsin in Preimplantation Embryos 



31319 



CM lanes). Additionally, it ftirther processed itself from a 43- to 
29-kDa form (Fig. 7, non-reduced WT lane). Upon reduction, 
only a 31-kDa band, which represented the heavy or catalytic 
chain, was seen, suggesting that only the light chain was pro- 
teolytically modified to generate the 29-kDa form seen under 
nonreducing conditions. The autoactivation of soluble hepsin 
upon elution was not seen with a catalytically inactive S352A 
soluble hepsin mutant, in which the active site serine was 
replaced by alanine (Fig. 7, S352A lanes). Of note, the initial 
eluate, when immediately prepared and separated by reducing 
SDS-PAGE, showed only a small amount of conversion to the 
two-chain form (data not shown). Similarly, the presence of the 
inhibitor benzamidine in the eluate prevented the conversion 
and only a small converted fraction was seen on reducing 
SDS-PAGE (data not shown), 

DISCUSSION 

We have identified hepsin, a membrane-bound serine prote- 
ase previously shown to activate fVII (35), in preimplantation 
mouse embryos as early as the two-cell stage. Based on evi- 
dence that a single serine protease is present in preimplanta- 
tion embryos (26), it is possible that hepsin represents the first 
such protease expressed during development. Prior in vitro 
experimentation implicated hepsin in the maintenance of cel- 
lular morphology and hepatoma cell growth (36), and in blood 
coagulation by human factor VII activation (35). Increased 
hepsin expression has also been associated with ovarian cancer 
(37). No developmental functions of hepsin have been de- 
scribed. Whether hepsin plays a critical role in early develop- 
ment is not clear, but it is possible that it plays a role in 
blastocyst hatching. 



or 



^ <^ 



Mr 



- 43kDa 



- 29 k Da 



■mm 



Non-Reduced Reduced 

Fig. 7. Soluble hepsin is capable of autoactivation. Wild-type 
and S352A soluble hepsin was isolated from medium conditioned by 
transfected 293 epithelial cells, and proteins were separated by both 
nonreducing and reducing SDS-PAGE and blotted to nitrocellulose 
membrane. The primary HFP-2 and anti-goat alkaline phosphatase- 
conjugated antibodies were used to visualize hepsin in conditioned 
medium (CM), as well as purified soluble hepsin (WT) and its inactive 
mutant {S352A). Molecular mass markers are shown in kDa. 



COOH 



The hepsin amino acid sequence suggests it is a type II 
transmembrane serine protease zymogen with an extracellular 
carboxyl-terminal catalytic domain. The internal signal se- 
quence, serving as a transmembrane domain, is surprisingly 
conserved. The presence of this transmembrane domain is con- 
sistent with Perona and Wassarman*s (26) data suggesting that 
the putative mouse hatching enzyme, which would be ex- 
pressed in early preimplantation embryos, is membrane- 
bound. The trj^jsin-specificity conferring Asp^^® that hnes the 
SI subsite and composes part of the specificity pocket is present 
and conserved, indicating that hepsin is likely to have trypsin- 
like specificity. Indeed, our activity assay of the recombinant 
soluble hepsin using a number of chromogenic substrates have 
confirmed this observation. The reason for the presence of two 
forms of hepsin, differing in the cytoplasmic domain, is not 
clear. The inserted sequence in the 1.9-kb form of hepsin has no 
homology to any domains found in signal transducing proteins. 
It is unlikely that changes to the cytoplasmic domain alter 
hepsin*s proteolytic properties, particularly since the soluble 
form of the enzyme is apparently fully functional. Whether the 
1.8- and 1.9-kb hepsin mRNAs are the result of two different 
genes or, more likely, the result of alternative splicing of a 
single gene transcript remains to be defined. 

Since hepsin is likely to be expressed as a zymogen based on 
the predicted amino acid sequence, and appears to be the only 
serine protease present during blastocyst hatching, the ques- 
tion arises, what is the mechanism of its activation? Our hy- 
pothesis is that density-dependent autoactivation occurs, as 
suggested by data from our soluble hepsin expression study. 
We noted that during purification, upon elution with EDTA, 
soluble hepsin was spontaneously converted to the active, 
disulfide-linked two-chain form probably via cleavage of the 
Arg^^^-Ile^^'^ bond. The conversion was clearly concentration 
dependent (activation was only seen in the eluate and not in 
the diluted conditioned medium) and required hepsin's inher- 
ent enzymatic activity since it was not observed with a cata- 
lytically inactive S352A mutant soluble hepsin. These data 
indicate that hepsin was capable of concentration-dependent 
autoactivation. Since hepsin is membrane-bound via a trans- 
membrane domain, its density and lateral diffusion on the 
trophoblast surface may play an important role in achieving 
the concentration needed for autoactivation (Fig, 8). This mode 
of autoactivation resembles fVII cell surface autoactivation, 
which utilizes distinct tissue factor molecules to localize both 
the fVII and fVIIa to the cell surface, forming two separate 
membrane-bound binary complexes. The complex with the ac- 
tive fVIla then activates the adjacent tissue factor-anchoring 

COOH 



COOH 




c-c- 



c-c- 



Extracellular 
Cytoplasmic 



NH. 



NH. 



Fio. 8. Model of hepsin activation. Based on structural similarities to other serine proteases, hepsin is expressed as a single-chain zymogen 
and can be activated proteolytically by a single cleavage at the Arg^'*^-lle^''~ bond to generate the two-chain, membrane-bound form. Its deduced 
primary amino acid sequence suggests that hepsin is expressed as a type II transmembrane zymogen with an extracellular carboxyl catalytic 
domain. The heavy or catalytic chain is linked to the light chain via a disulfide bond (C-C). The light chain is anchored to the cell membrane by 
a hydrophobic, internal signal sequence. Based on the soluble hepsin expression studies, the mode of activation on the cell surface is likely to be 
autoactivation. Our evidence further suggests that a soluble form, resulting from additional cleavages of the membrane-bound light chain, is 
possible. 



31320 



Hepsin in Preimplantation Embryos 



fVII, obeying obligatory two-dimensional enzyme kinetics (38). 
Hepsin autoactivation is likely to follow similar kinetics, but 
further studies are necessary to elucidate its mechanism of cell 
surface autoactivation. Interestingly the recent purification of 
intact hepsin from rat Uver microsomes also resulted in its 
activation (39), but it was not clear if this was the result of 
autoactivation or of the action of another protease. Our data 
with the inactive hepsin mutant suggest that membrane-bound 
hepsin is capable of autoactivation. 

The autoactivation of soluble hepsin additionally generated a 
second form of the enzyme. A band of 29 kDa, which was absent 
in the S352A mutant, along with the intact 43 kDa, were both 
present when the eluate was analyzed on nonreducing SDS- 
PAGE and Western blot experiments. This 29-kDa form was 
likely to be the result of proteolytic modification of the light 
chain of the active two-chain form since only the intact cata- 
lytic heavy chain was seen under reducing conditions. The 
presence of this 29-kDa form suggests that membrane-bound 
hepsin can be cleaved off the trophoblast surfaces of embryos 
(Fig. 8). Interestingly, Sawada et al. (40) have demonstrated 
the presence of a soluble trypsin-like activity in blastocyst 
culture medium and that this activity represented that of a 
hatching enzyme. Whether this secreted trypsin-like activity 
and the 29-kDa form of hepsin are one and the same, and what 
roles it may play during embryogenesis, remain to be 
determined. 

Acknowledgments — We thank Dr. Charles Esmon for financial sup- 
port in making the antibody HFP-2, and the generous gifts of the mAb 
HPC4 and HPC4-linked AfTi-Gel 10. We also thank Dr. Alireza Rezaie 
for the use of RSV-PL4, the V^^^ microplate reader (Molecular Devices), 
and the chromogenic substrate Spectrozyme PCa (American 
Diagnostica). 

REFERENCES 

1. McLaren, A. (1982) Reproduction in Mammals, Cambridge University Press, 

Cambridge, MA 

2. Nothias, J. Y., Majumder, S., Keneko, J., and DePamphilis, M. L. (1995) 

J. Biol. Chem. 270, 22077-22080 

3. Hishida, R., Ishihara, T., Kondo, K., and Kataura, I. (1996) EMBO J. 15, 

4111-4122 

4. Yasumasu, S., Yamada, K., Akasaka^ K., Mitsunaga. K, luchi, 1., Shimada, H., 

and Yamagami, K. (1992) Dev. Biol. 153, 250-258 

5. Elaroussi, M. A., and DeLuca, H. F. (1994) Biochim. Biophys. Acta 1217, 1-8 

6. Katagiri, C, Maeda, R., Yamashika, C, Mita, K., Sargent, T. D., and 

Yasumasu, S. (1997) Int J. Dev. Biol. 41, 19-25 



7. Yon, L., Pollock, G. H., Nagase, H„ and Sarras, M. P.. Jr. (1995) Development 

121, 1591-1602 

8. Shiracll, M. J., Fergmon, E. L., Childs, S. R., and O'Connor, M. B. (1991) Cell 

67, 469-481 

9. Wozney, J. M., Rosen, V., Celeste, A. J.. Mitsock, L. M., Whitters, M. J., Kriz, 

R. W., Hewick, R. M., and Wang. E. A. (1988) Science 242, 1528-1534 

10. Fukagawa, M., Suzuki, N., Hogan, B. L., and Jones, C. M. (1994) Dev. Biol. 

163, 175-183 

11. Kessler, E., Takahara, K., Biniaminov, L., Brusel, M., and Greenspan, D. S. 

(1996) Science 271, 360-362 

12. Li, S. W., Sieron, A. L., Fertala, A., Hojima, Y., Arnold, W, V., and Prockop, 

D, J. (1996) Proc. Natl. Acad. Sci. U. S. A. 93, 5127-5130 

13. Wolfeberg, T. G., Bazan, J. F., Blobel, C. P., Myles, D. G., Priniakoff, P., and 

White, J. M. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 10783-10787 

14. Myles, D. G., Kimmel, L. H., Blobel, C. P., White, J. M., and Primakoff, P. 

(1994) Proc. Natl. Acad. Sci. U. S. A. 91, 4195-4198 

15. Evans, J. P., Schultz, R. M., and Kopf, G. S. (1995) J. Celt. Sci. 108, 3267-3278 

16. Huovila, A. P. J., Almeida, E. A., and White, J. M, (1996) Curr. Opin. Cell. Biol. 

8, 692-699 

17. Bond, J. S., and Beynon, R, J. (1995) Protein Sci. 4, 1247-1261 

18. Appel, L. F., Prout, M., Abu-Shumays, R., Hammonds, A., Garbe, J. C, 

Pristrom, D., and Fristrom, J. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 
4937-4941 

19. Chasan, R., and Anderson, K. V. (1989) Cell 58, 391-400 

20. Jin, Y., and Anderson, K. V. (1990) Cell 60, 873-881 

21. Stein, D., and Nusslein-Volhard, C. (1992) Cell 68, 429-440 

22. Schneider, D. S., Jin, Y., Morisato, D., and Anderson, K. V. (1994) Development 

120, 1243-1250 

23. Smith, C. L., and DeLotto, R. (1994) Nature 368, 548-551 

24. Morisato, D., and Anderson, K. V. (1994) Cell 76, 677-688 

25. Morisato, D., and Anderson, K. V. (1995) Anna. Rev. Genet. 29, 371-399 

26. Perona, R. M., and Wassarman, P. M. (1986) Dev. Biol. 114, 42-52 

27. Chomczynski, P., and Sacchi, N. (1987) Ana/. Biochem. 162, 156-169 

28. Vu, T.-K. H., Hung, D. T., Wheaton, V. I,, and Coughlin, S. R. (1991) Cell 64, 

1057-1068 

29. Vu, T.-K. H., Wheaton, V. I., Hung, D. T., Charo, I., and Coughlin, S, R. (1991) 

Nature 353, 674-677 

30. Rezaie, A. R., and Esmon, C. T. (1992) J. Biol. Chem. 267, 26104 -26109 

31. Steams, D. J., Kurosawa, S., Sims, P. J., Esmon, N. L., and Esmon, C. T.(1988) 

J. Biol. Chem. 263, 826-832 

32. Deleted in proof 

33. Le>'tus, S. P., Loeb, K. R., Hagen, F. S., Kurachi, K., and Davie, E. W. (1988) 

Biochemistry 27, 1067-1074 

34. Tsuji, A., Torres-Ro.sado, A., Arai, T., lAi Beau, M. M., Lemons, R. S., Chou, 

S-H.. and Kurachi, K. (1991) J. Biol. Chem. 266, 16948-16953 

35. Kazamo, Y., Hamamoto, T., Foster, D. C, and Kisiel. W. (1995) J. Biol. Chem. 

270, 66-72 

36. Torres-Rosado, A., O'Shea, K. S., Tsuji, A., Chou, S-H., and Kurachi, K. (1993) 

Proc. Natl Acad. Sci. U. S. A. 90, 7181-7185 

37. Tanimoto, H., Yan, Y., Clarke, J., Korourian, S., Shigemasa, K., Parmley, 

T. H., Parham, G. P., and O'Brien, T. J. (1997) Cancer Res. 57, 2884-2887 

38. Neuenschwander, P. F., Fiore, M. M., and Morrissey, J. H. (1993) J. Biol. 

Chem. 268, 21489-21492 

39. Zhukov, A., Hellman, U., and Ingelman-Sundberg, M, (1997) Biochim. 

Biophys. Acta 1337, 85-95 

40. Sawada, H., Yamazaki, K., and Hoshi, M. (1990) J. Exp. Zool. 253, 83-87 




Exhibit 39 



[CANCER RESEARCH 60. 2602-2606, May 15, 20DOJ 

Advances in Brief 



A Novel Transmembrane Serine Protease (TMPRSS3) Overexpressed 
in Pancreatic Cancer*'^ 

Christine Wallrapp, Susanne Hahnel, Friederike Miiiler-Pillasch, Beata Burghardt, Takeshi Iwamura, 
Manue! Ruthenbiirger, Markus M. Lerch, Guido Adier, and Thomas M. Gress^ 

Department of Internal Medicine A University of Vim, 89081 Ulm. Germany fC. IV., S. H.. F. M-P.. G. A., T. M. C.]: Hungarian Academy of Sciences, University of Budapest. 
N50 Budapest, Hungary [B. B.J; Department of Medicine B, University of Munster, 48129 Munster. Germany [M. M. L.. M. R.J; Miyazaki Medical College. Kiyofake, Miyazaki 
889-1692, Japan [T. I.J 



Abstract 

We report the characterization of a novel serine protease of the chy- 
motrypsin family, recently isolated by cDNA-representational difference 
analysis, as a gene overexpressed in pancreatic cancer. The 2.3-kb mRNA 
of the gene, named TMPRSS3^ is strongly expressed in a subset of pan- 
creatic cancer and various other cancer tissues, and its expression corre- 
lates with the metastatic potential of the clonal SUIT*2 pancreatic cancer 
cell lines. The deduced polypeptide sequence consists of 437 amino acids 
and exhibits all of the structural features characteristic of serine proteases 
with trypsin-Iike activity. TMPRSS3 is membrane bound with a NH^- 
terminal signal-anchor sequence and a glycosylated extracellular region 
containing the serine protease domain. Thus, TMPRSS3 is a novel mem- 
brane-bound serine protease overexpressed in cancer, which may be of 
importance for processes involved in metastasis formation and tumor 
invasion. 

Introduction 

Proteases have been increasingly recognized as important factors in 
the pathophysiology of tumorous diseases. The proteolytic degrada- 
tion of the extracellular matrix, which is an indispensable step in 
tumor invasion and metastasis, is mediated by members of the four 
major classes of endopeptidases, including serine, cysteine, aspartyl, 
and metalloproteases (1). In this highly complicated process, a cas- 
cade of events requiring a variety of proteases seems to be involved. 
Numerous reports have demonstrated an increased production of 
extracellular matrix degrading enzymes, including type IV collagen- 
ase (MMP-2), cathepsin B, cathepsin D, and serine proteases such as 
plasminogen activator in tumor cells (1). The proteolytic enzymes of 
the serine protease family exist as single-chain or double-chain zy- 
mogens activated by specific and limited proteolytic cleavage. They 
contain the three active-site amino acids histidine, aspartate, and 
serine, which participate in peptide bond hydrolysis. The geometric 
orientation of this catalytic triad is similar in different serine 
proteases, despite the fact that folding of the proteases may be 
different (2). 

In the present study, we report the cloning and characterization of 
a novel serine protease identified in a recent cDNA-RDA** approach 
(3), This study was designed to isolate gene fragments highly over- 



Received 12/17/99; accepted 3/30/00. 

The costs of publication of this article were defrayed in part by the payment of page 
charges. This article must ihercforc be hereby marked advertisement in accordance with 
18 U.S.C. Section 1734 solely to indicate this fact. 

' This work was supported by grants from the Bundesministeriiim fiir Bildung und 
Forschung (01 GB940I), the European Community (BMH4-CT98-3085), and the Deut- 
sche Forschungsgcmcinschaft (SFB518, project BI; to T. M. G.). 

^ The nucleotide sequence in this report has been submitted to the GenBank Data 
Library with accession no. AF 1 79224, 

■* To whom requests for reprints should be addressed, at Department of Internal 
Medicine 1. University of Ulm, 89081 Ulm, Germany. Phone: 49-731-5024385: Fa.x: 
49-731-5024302: E-mail: lhomas.grcss@mcdi/jn.uni-uIm.dc. 

The abbreviations used arc: RDA, representational difference analysis; PNGase F, 
pcptidc-A^-glycosidasc F. 



expressed in pancreatic cancer compared with normal pancreas and 
chronic pancreatitis tissue. From the 16 gene fragments isolated in this 
study, we selected the 313-bp gene fragment RDA 12 (GenBank 
accession no. U54603) for further characterization. Database compar- 
ison revealed a moderate homology to a number of serine proteases, 
indicating that RDA 1 2 may be a fragment of a novel protease with 
cancer-specific expression. 

IMaterials and Methods 

Materials. Human tissue from patients with ductal adenocarcinoma of the 
pancreas (/? = 13), carcinoma tissues of different origin, human pancreatic 
tissue from organ donors {n = 6)» and chronic pancreatitis tissue {n = 6) was 
provided by the Hungarian Academy of Sciences (Budapest, Hungary) and the 
Department of Surgery of the University of Ulm. All tissue samples were 
obtained after approval by the local Ethics Committee. 

The human pancreatic cancer cell lines were obtained from the following 
suppliers: PATU-8988S and PATU-8988T (German Collection of Microor- 
ganisms and Cell Cultures, Braunschweig, Germany); PANC-1 and MIA- 
PaCa-2 (European Collection of Animal Cell Cultures, Salisbury, United 
Kingdom); HPAF (Melzgar, Durham, NC); Capan-1, Capan-2, and AsPC-1 
(Cell Lines Ser\'ice, Heidelberg. Germany); Patu II (Elsasser, Marburg, Ger- 
many); PC2 (Bulow, Mainz, Gennany); SUIT-2 (S2-007, S2-013, S2-020, and 
S2-028; Iwamura, Miyazaki. Japan; Ref 4); and SKPC2 and IM1M-PC2 
(P. Real, IMIM, Barcelona, Spain). 

Cloning of a New Serine Protease cDNA. In a recent screen for differ- 
entially expressed genes in pancreatic carcinoma, the 313-bp gene fragment 
RDA 12 (accession no. U54603) was isolated by cDNA-RDA (3); this fragment 
encodes the putative motif of a new serine protease. The RDA 12 fragment was 
used to screen —20,000 clones of an oligo(dT)-primed cDNA library from a 
pancreatic cancer cell line by hybridization. Both strands of the longest cDNA 
clone, RDA 12/2, were sequenced by primer walking. For stable transfection in 
mammalian cells, the cDNA clone RDA 12/2 was cloned in sense and antiscnse 
orientation into the BamHX site of the mammalian expression vector pH^- 
Aprl-neo (5). A COOH-terminal-tagged TMPRSS3 expression vector was 
constructed by insertion of a 1427-bp fragment (nucleotides 96-1522) con- 
taining the open reading frame of TMPRSS3 into the BstXX site of the mammalian 
expression vector pcDNA6/V5/His B (Invitrogen, San Diego, CA). 

Northern Blot Analyses. The expression of TMPRSS3 was studied by 
hybridizations using Northern blots containing 30 ^ig each of total RNA from 
normal pancreas tissue, chronic pancreatitis tissue, different carcinoma tissues, 
and cell lines. The Northern blols containing RNA of different human tissues 
were purchased from Clontech (Heidelberg, Germany). 

Cell Culture and Transfection. For functional analysis of TMPRSS3, the 
S2-020 pancreatic cancer cell line, which expresses no endogenous TMPRSS3 
mRNA, was transfected with the 7'A//'/?55i-pH)3-Aprl-neo construct in sense 
and antiscnse orientation using DMRIE-C (Life Technologies. Inc., Eggen- 
stein, Germany). Several clones were picked that showed various degrees of 
stable TMPRSS3 sense/antisense mRNA expression. Two of each sense and 
antisense clones were used for functional assays. 

HEK-293 cells were plated at 1.5 x 10** cells/1 0-cm dish and grown 
overnight in DMEM supplemented with 10% PCS. Cells were transiently 
transfected with the rA/P/?55i-pcDNA6/V5/His plasmid DNA by use of the 
calcium phosphate protocol. 



2602 



VEL SERINE PROTEASE IN PANCREATIC CANCER 

^ ACACAGAGAGAGGCAGCAGCTTGCTCAGCGGACA 
35 AGGATGCTGGGCGTGAGGGACCAAGGCCTGCCCTGCACTCGGGCCTCCTCCAGCCAGTGCTGACCAGGGACTTCTGACC^ 
125 CAGGACCTGTGTGGGGAGGCCCTCCTGCTGCCTTGGGGTGACAATCTCAGCTCCAGGCTACAGGGAGACCGGGAGGATCACAGAGCCAGC 
2X5 ATGTTACAGGATCCTG ACAGTGATCAACCTCTG AACAGCCTCGATGTCAAACCCCTGCGCAAACCCCGTATCCCCATGG AGACCTTCAG A 
IMLQDPDSDQPLNSLDVKPLRKPRI PMETPR 

3 05 AAGGT GGGGATCCCCATCATCATAGCACTACTGAGCCTGGCGAGTATCATCATTGTGGTTGTCCTCATC AAGGTGATTCTGGATAAATAC 

31 K |y^;-G V-i vfi :i A^.^i^ ::s-. L ;A v a> i ; i;; i - ?v .v.. ;y -i^-^"^^ v i l d k y 
39 5 tacttcctctgcgggcagcctctccacttcatcccgaggaagcagctgtgtgacggagagctggactgtcccttgggggaggacgagga^ 

eiYFLCGQPLHFlPRKQLCDGELDCPLGEDEE 

485 cactgtgtcaagagcttccccgaagggcctgcagtggcagtccgcctctccaaggaccgatccacactgcaggtgctggactcggcc^^ 
91hcvksfpegpavavrlskdrstlqvldsat 

575 gggaactggttctctgcctgtttcgacaacttcacagaagctctcgctgagacagcctgtaggcagatgggctacagcagc;^ 

121 GNWFSACFDNFTEALAETACRQMGYSSKPT 



665 TTCAGAGCTGTGGAGArTGGCCCAGACCAGGATCTGGATGTTGTTGAAATCACAGAAAACAGCCAGGAGCTTCGCATGCGGAACTCAAGT 
151 FRAVEIGPDQDLDVVEITENSQELRMR N S S 



755 GGGCCCTGTCTCTCAGGCTCCCTGGTCTCCCTGCACTGTCTTGCCTGTGGGAAGAGCCTGAAGACCCCCCGTGTGGTGGGTGGGGAGGAG 
ISlGPCLSGSLVSIiHCLACGKSLKTPRVVGGEE 

84 5 GCCTCTGTGGATTCTTGGCCTTGGCAGGTCAGCATCCAGTACGACAAACAGCACGTCTGTGGAGGGAGCATCCTGGACCCCCACTGGGTC 
211 ASVDSWPWQVSIQYDKQHVCGGSILDPHWV 

935 CTCACGGCAGCCCACTGCTTCAGGAAACATACCGATGTGTTCAACTGG AAGGTGCGGGCAGGCTCAGACAAACTGGGCAGCTTCCCATCC 
241 LTAAHCFRKHTDVFNWKVRAGSDKLGSFPS 

▲ 

1025 CTGGCTGTGGCCAAGATCATCATC ATTGAATTC AACCCCATGT ACCCCAAAG ACAATGACATCGCCCTCATGAAGCTGCAGTTCCCACTC 
271 LAVAKI I I lEFNPMYPKDN^lALMKLQFPL 

1115 ACT^CTCAGGCACAGTCAGGCCCATCTGTCTGCCCTTCTTTGATGAGGAGCTCACTCCAGCCACCCCACTCTGGATCATTGGATGGGGC 
301 TPSGTVRPICLPFFDEELTPATPLWIIGWG 

12 05 TTTACGAAGCAGAATGGAGGGAAGATGTCTGACATACTGCTGCAGGCGTCIAGTCCAGGTCATTGACAGCACACGGTGCAATGCAGACG 
331FTKQNGGKMSDILLQASVQVIDSTRCNADD 

12 95 GCGTACCAGGGGGAAGTCACCGAGAAGATGATGTGTGCAGGCATCCCGGAAGGGGGTGTGGACACCTGCCAGGGTGACAGTGGTGGGCCC 
361 AYQGEVTEKMMCAGI P-EGGVDTCQGD^GGP 

1385 CTGATGTACCAATCTGACCAGTGGCATGTGGTGGGCATCGTTAGCTGGGGCTATGGCTGCQQGGGCCCGAGCACCCCAGGAGTATACACC 
391 LMYQSDQWHVVGIVSWGYGCGGPSTPGVYT 

14 75 AAGGTCTCAGCCTATCTCAACTGGATCTACAATGTCTGGAAGGCTGAGCTGTAATGCTGCTGCCCCTTTGCAGTGCTGGGAGCCGCTTCC 
421 KVSAYLNWIYNVWKAEL* 

1565 TTCCTGCCCTGCCCACCTGGGGATCCCCCAAAGTCAGACACAGAGCAAGAGTCCCCTTGGGTACACCCCTCTGCCCACAGCCTCAGCATT 

1655 TCTTGGAGCAGCAAAGGGCCTCAATTCCTATAAGGAACCCTCGCAGCCCAGAGGCGCCCAGAGGAAGTCAGCAGCCCTAGCTCGGCCACA 

174 5 CTTGGTGCTCCCAGCATCCCAGGGAGAGACACAGCCCACTGAACAAGGTCTCAGGGGTATTGCTAAGCCAAGAAGGAACTTTCCCACACT 

183 5 ACTGAATGGAAGCAGGCTGTCTTGTAAAAGCCCAGATCACTGTGGGCTGGAGAGGAGAAGGAAAGGGTCTGCGCCAGCCCTGTCCGTTTT 

1925 CACCCATCCCCAAGCCTACTAGAGCAAGAAACCAGTTGTAATATAAAATGCACTGCCCTACTGTTGGTATGACTACCGTTACCTACTGTT 

2015 GT CATTGTTATTAC AG CT ATGG CO ACT ATT ATT AAA G AGCTGTGT AACATTTCTGGCAAAAAAAAAA 

Fig. 1. Nucleotide sequence of the cDNA coding for human TMPRSS3 and its predicted amino acid sequence. The bold nucleotide sequence 1189-1501 represents the initially 
isolated RDA12 gene fragment, the underlined nuc]eotides 2045-2050 mark the potential polyadcnylation signal. The amino acid sequence highlighted by a gray box represents the 
potential transmembrane domain. A indicates the active-site residues histidinc (//), aspartate (£)), and serine (S). Double underlines indicate potential A^-linked glycosylation sites. 




Preparation of Cell Extracts and Subcellular Fractionation. Forty-eight 
h after transient transfection with V5-tagged TMPRSS3 into HEK-293 cells, 
protein extracts were prepared by resuspending pelleted cells in 1% Triton 
X-100, 1% sodium deoxycholaie, 0.1% SDS, 150 niM NaCl, 50 mM Tris-HCl 
(pH 7.2) supplemented with 5 /u.g/ml Aprotinin. 5 iriM Pefabloc, and 10 jLjtg/ml 
Pepstatin. For immunopurification of the epitope-tagged protein, cell lysates 
were incubated with V5 antibody conjugated to protein G-agarose beads at 4°C 
for 4 h on a shaker. The agarose beads were pelleted by centrifugation and 
washed twice with 1 50 mM NaCl, 5 mM EDTA, 50 mM Tris + 0. 1% NP40. The 
washed pellets were resuspended in 150 mM NaCl, 5 mM EDTA, 50 mM 
Tris + 0.1% NP40 for PNGase F treatment. 

Subcellular fractions were prepared from transiently transfccted HEK-293 
cells as reported previously (6). TTie plasma membrane- enriched fraction, 
which was prepared using sucrose density gradient centrifugation, the cytosolic 
fraction, and concentrated culture medium were studied by Western blot 
analysis. 



Glycosylation. For PNGase F treatment, immunopurified protein was in- 
cubated overnight with 2 units of PNGase F supplemented with 10 mM EDTA 
at 37°C. Inhibition of A^- and mucin-like O-glycosylation was perfonned by 
cultivating TMPRSS3-expressing HEK-293 cells for 24 h in DMEM, 10% 
FCS containing either 2.5 /Ltg/ml tunicamycin (7) or 2 mM phenyl-N-Acetyl- 
a-D-galactosaminide (8). Thereafter, cells were harvested for protein extrac- 
tion. 

Functional Assays. Nude mouse experiments were done by injecting 
2 X 10* S2-020 cells stably transfccted with TMPRSS3 sense/anti sense con- 
structs, both s.c. and in the tail vein of female nu/tw mice. Five weeks after the 
tail vein injections, the lung, spleen, and liver were used for standard histo- 
logical analysis to identify the presence or absence of metastatic lesions. 
Subcutaneous tumors were measured and used for histological analysis. 

//I vifro mairigel invasion assays were done by seeding 10* transfccted cells 
in medium + 1% FCS in the upper chamber of Matri gel -coated 8-fxm tran- 
swell plates. The lower chamber was filled with medium + 10% FCS. The 



2603 



)VEL SERINB PROTEASE IN PANCREATtC CANCER 



normal 
pancreas 



chronic 
pancreatitis 



Fig. 2. Northern bloi analyses of ihc TMPHSS3 transcript in 
difTcrcnt tissues and cell lines. Tlic Northern blots contain 30 p.g 
of total RNA per lane from normal human pancreas {n = 6), 
chronic pancreatitis tissue (n = 6), pancreatic carcinoma tissue 
(/I = 13; Lanes I-I3), and cancer tissues of different origin 
{Lanes 14-16, 19-21, and 23, colorectal carcinoma; Lanes 17 
and 2S-27^ gastric cancer. Lane 22, soft tissue sarcoma; Lane 18, 
breast cancer; Lane 24, carcinoma of the papilla vateri) and the 
SUIT-2 subclones S2-028, S2-0U, and S2-007. RNAs from 
normal pancreas, chronic pancreatitis, and pancreatic cancer 
tissue samples were run on the same Northern blot gels. The 
auto radiographs for cancer and control tissues arc shown sepa- 
rately for improved prescniaiion of the data. 





2.3 kb 



pancreatic cancer tissue 

1 2 3 4 6 6 7 8 9 10 11 12 13 



different cancer tissues 
14 16 16 17 18 19 20 21 22 23 24 26 26 27 





■ 2.3 kb 



number of invading cells adhering to the lower side of the porous membrane 
was counted after fixation with 4% paraformaldehyde and staining with 
methylene blue. 

The proteolytic activity in TMPRSS3 sense/antisense-transfected S2-020 
cells and transiently transfected HEK-293 cells was determined fluorometri- 
cally in native lysates and lysates treated with enterokinase for activation, 
using oligopeptide substrates for clastasc-like (Ala-Ala-Ala-Ala) and trypsin- 
like (Ile-Pro-Arg) serine proteases as described previously (9). 

Chromosomal Mapping of the TMPRSS3 Gene Locus. The chromo- 
somal localization of TMPRSS3 was determined by screening the GeneBridge4 
radiation hybrid panel (Research Genetics, Hunisville, AL), using the 
TMPRSS3'S^tc\V\(: primers 5'-CATGTGGTGGGCATCGTTA-3' and 5'- 
CCAGTTGAGATAGGCTGAG-3 ' . 

Results and Discussion 

The 3I3-bp fragment encoding the putative motif of a new serine 
protease isolated in a recent cDNA-RDA screen for genes differen- 
tially expressed in pancreatic cancer (3) was used to screen a pancre- 
atic cancer cDNA Hbrary. Among 16 isolated homologous clones, a 
clone designated RDA12/2 contained the full-length sequence. The 
sequence of clone RDA12/2 comprised 2071 bp, including a 214-bp 
5' untranslated region, an open reading frame of 1311 nucleotides, 
and a 546-bp 3' untranslated region (Fig. I). Translation of the open 
reading frame suggests that the cDNA codes for a putative polypep- 
tide of 437 amino acids with an estimated molecular mass of 48.202 
kDa. The NHj-temiinal region of the hypothetical protein contains a 
putative signal-anchor sequence characteristic for group 11 integral 
membrane proteins. The highly hydrophobic region of 22 amino acids 
may serve as a transmembrane domain that is involved in anchoring 
the protease to the cell membrane. According to the charge difference 
rule (10), it can be assumed that the COOH terminus of the protein 
with its protease module is located on the extracellular surface. 

Although the nucleotide sequence is unique, database comparisons 
of the amino acid sequence revealed a homology to a number of serine 
proteases. Thirty- five percent identity and -^50% similarity was found 
to members of the serine protease family known as the human trans- 
membrane proteases, TMPRSSI/hepsin (1 1) or TMPRSS2 (12). Thus, 
our new protease is the third member of a family of transmembrane- 
bound serine proteases. Consequently, this new gene was named 
TMPRSS3 for transmembrane protease, serine 3. Sequence homology 
was high in the domains containing the three principal active-site 
amino acids H'"*', D"*", and S"**^, required for peptide bond hydrol- 
ysis. The arrangement of the catalytic residues in the linear sequence 
defines the membership of TMPRSS3 to the SI family of the chy- 
motrypsin clan SA of serine-type peptidases (2). The prototype of this 
family is chymotrypsin, and the three-dimensional structures of some 
of its members have already been resolved (12). 



TMPRSS3 is predicted to cleave in a trypsin-like manner after 
lysine or arginine residues because it contains D"'^' at the base of the 
specificity pocket that binds the substrate (13). In addition, the novel 
protein shares considerable structural similarities of the TMPRSS 
family, including the putative NH2-terminal membrane anchor and the 
conserved cysteine residues, which by homology most likely form the 
disulfide bonds C'^^'^-C^^", C^-•»*^-C^^^ C^^^-C-''^ and C^«^-C*'**. 
Serine proteases are most commonly synthesized as inactive proen- 
zymes, which are activated by extracellular, proteolytic removal 
of a propeptide. At the NH^-terminal part of the protease domain, 
TMPRSS3 contains the peptide sequence RVVGG, which is typical 
for the proteolytic activator site of many protease zymogens. The 
potential cleavage between R"""* and V*^"'^ would result in a new 
terminal a-amino group, which forms a salt bridge with D'^**^ and 
thereby leads to the assembly of the functional catalytic sites. There- 
fore, the activated form would consist of a non-protease and a protease 
subunit linked by a disulfide bond that most likely involves C**^- 
C^'**. Whether this activation is mediated under physiological condi- 
tions by autocatalytic cleavage or other proteases is not known. The 
TMPRSS3 gene locus was localized to chromosome 1 1 at q23.3 
between the markers Dl 1S4362 and Dl 1S4387 by use of a radiation 
hybrid panel. 

As anticipated, an overexpression of the 2.3-kb transcript was 
found in 9 of 13 primary pancreatic carcinoma tissues (Fig. 2) and in 
10 of 16 pancreatic carcinoma cell lines (not shown) by Northern blot 
analysis. Because TMPRSS3 was not expressed in normal pancreas 
{n = 6) and in chronic pancreatitis {n ~ 6) tissue samples, overex- 
pression appears to be cancer-specific and not due to inflammatory 
alterations in the stroma. No clear correlation was found between the 
stage of pancreatic tumors and the expression of the protease (Table 
1). Northern blot analyses with RNA from a small number of other 
tumor tissues revealed that TMPRSS3 overexpression is not restricted 



Tabic 1 TNM classification of pancreatic cancer patients 



Tissue sample 



TNM classification 



1 


T3N,Mo 


2 


T3N,Mo 


3 


T.N, Mo 


4 




5 


TjNiM^ 


6 


TjNiMo 


7 


T^N^Mo 


8 


TjNoMo 


9 


TjN.Mo 


10 


TjNjMo 


11 


TsN.Mo 


12 


T4N0M, 


13 


T,N,M, 



2604 



VEL SERINE PROTEASE IN PANCREATIC CANCER 



Fig. 3. a, hydropaihicity ploi of the predtcied TM- 
PRSS3 protein. The method of Kytc and Doolitilc 
(20) was used, using a window of 1 7 residues (http:// 
bioinfonnaiics.weizmann.ac.iI/hydroph/). The peak 
spanning amino acids 32-53 represents the putative 
transmembrane domain. 6, schematic representation 
of the different domains of TMPRSS3, a type II 
membTane-associalcd serine protease. Numbers cor- 
respond to the amino acids, deduced from the cDNA 
sequence shown in Fig. 1. The disulfide bonds were 
deduced based on the structure of TMPRSSI and 
TMPRSS2, the most homologous proteins. poL^ 
potential. 



a 



4- 



hydrophobtc 




hydro philic 



r 
0 



-T" 

50 



— , — 
100 



150 



200 



250 



— I — 
300 



— , — 
350 



— I — 
400 



cytoplasmic 

trans- 
membrane 



extracellular 



protease domain 



NH2 



mill 



01 



St 



pot. N-gtycosylation 



g 



CQ 

7L 



COOH 



pot. disulfide bonds 



to pancreatic cancer, but can also be found in gastric (h = 4), expressed in HEK.-293 cells, immunoprecipitated, and treated with 



colorectal (w = 7), and ampullary (« = 1) cancer No expression was 
found in one tissue sample each of soft tissue sarcoma and breast 
cancer (Fig. 2). TMPRSS3 transcripts were not detectable in normal 
heart, brain, placenta, lung, liver, skeletal muscle, uterus, and adipose 
tissue. A weak signal was found in tissues of the normal gastrointes- 
tinal tract (esophagus, stomach, small intestine, colon) and in some 
tissues of the urogenital tract (kidney and bladder). Nevertheless, 
expression was much weaker than in the corresponding tumors (data 
not shown). Furthermore, we analyzed the expression of TAiPRSS3 in 
the SUIT-2 clonal cell lines S2-007, S2-013, and S2-028 (4). These 
subclones of the human pancreatic cancer cell line SUIT-2 differ in 
their spontaneous metastatic potential after s.c. injection in nude mice. 
In this setting S2-007 regularly shows a high rale of metastases, 
whereas the other two cell lines show a lower rate (S2-013) or no 
metastases at all (S2-028). As shown in Fig. 2, the strength of 
TMPRSS 3 Gxprtssion correlated well to the metastatic potential of the 
SUIT-2 subclones, which may serve as an indication that this serine 
protease is associated with the promotion of metastasis. 

The sequence of TMPRSS3 suggests that this novel serine protease 
contains a signal anchor characteristic for group II integral membrane 
proteins with a hydrophobic transmembrane domain (Fig. Za). Ac- 
cording to the charge difference rule (10), the transmembrane domain 
(amino acids 32—53) anchors the protease to the cell membrane. 
Because of this anchorage, the NH2-terminal domain (amino acids 
1-31) would appear to be located intracellularly, and the COOH- 
terminal region (amino acids 54-437), which contains the catalytic 
domain, would be located extracellularly (Fig. 36). The alleged sub- 
cellular localization of the protease was confimied using a V5-tagged 
TMPRSS3 construct, which was transiently transfccted into HEK-293 
cells. Membrane fractionation and Western blotting with the corre- 
sponding anti-V5 antibody revealed a signal only in the plasma 
membrane-enriched fraction, whereas no tagged TMPRSS3 protein 
was detectable in the cytosol and in the culture medium (Fig. 4), 

This experiment also uncovered post-translational modifications of 
TMPRSS3. Although the calculated theoretical molecular mass of the 
epitope-tagged fusion protein is 52 kDa, its size in a SDS-polyacryl- 
amide gel is -^68 kDa, suggesting the presence of potential carbohy- 
drate moieties. The primary sequence of TMPRSS3 displays two 
consensus motifs for A^-linked glycosylation (N-X-T/S) at N'**** and 
N'^**. To confirm this A^-glycosylation, epitope-tagged TMPRSS3 was 



PNGase F. This resulted in an increase in mobility on denaturing 
SDS-PAGE, demonstrating A^-glycosylation of TMPRSS3 (Fig. 4). 
Cultivation of transfected HEiC-293 cells in the presence of tunica- 
mycin, an inhibitor of yV-glycosylation, revealed the same mobility 
shift of TMPRSS3 to a molecular mass of 60 kDa. Phenyl-A^-acetyl- 
a-D-galactosaminide, which inhibits mucin-like O-glycosylation, had 
no effect on the molecular mass (data not shown). The generation of 
recombinant proteases frequently has been shown to be difficult or 
impossible (14). Despite extensive and repeated efforts, we were 
unable to successfully generate recombinant protein in Escherichia 
coli and insect cells, possibly because TMPRSS3, as many other 
proteases, had a cytotoxic effect on transfected cells. Repeated efforts 
to generate peptide antisera failed as well (data not shown), and a 
TMPRSS3 antibody was therefore not available for further studies. 

Whereas the established physiological role of the chymotrypsin 
family of secreted serine proteases is primarily in protein catabolism, 
the ftinction of serine proteases of the TMPRSS family is of special 
interest. Although the function of TMPRSS2 remains unknown (12, 
15), TMPRSSI, also known as hepsin, frequently is overexpressed in 



M S 



68 kDa - 



- 60 kDa 



Fig. 4. Western blot analysis of V5-taggcd TMPRSS3 protein. Protein extracts from 
rA//»/f55i-pcDNA6/V5/His-lransfcctcd HEK-293 cells were resolved in 9% SDS-PAGE 
and transferred to nitrocellulose membranes. Membranes were immunoblotted with an 
anii-V5-horseradish peroxidase antibody followed by chcmiluminescencc detection, a, 20 
\L% of total protein extract, b, subcellular localization; C, cytosolic fraction; A/, plasma 
membrane-enriched fraction; 5, concentrated culture medium, c, analysis of A^-linkcd 
glycosylation of the TMPRSS3 protein. A shift in molecular mass was delected both after 
PNGase F treatment of the immunoprecipitatcd protein and after exposure of the trans- 
fected cells to tunicamycin. indicating A'- glycosylation of the protein. 



2605 



\WEL SKRINH PROTBASE IN PANCREATIC CANCER 



ovarian tumors and may therefore contribute to the invasive nature or 
growth capacity of ovarian tumor cells (16). Treatment of hepatoma 
cells with antihepsin antibodies or specific antisense oligonucleotides 
confirmed that hepsin plays an essential role in cell growth and 
maintenance of cell morphology (17). It has also been shown that 
hepsin can proteolytically activate human coagulation factor VII and 
thereby contribute to the activation of the coagulation cascade (18). 

The correlation of TMPRSS3 expression with the metastatic poten- 
tial of the SUIT-2 cell lines is a first indication that this new protease, 
in the same way as hepsin, may be involved in promoting metastasis 
formation and tumor invasion. To confirm this hypothesis in func- 
tional assays, stably transfectcd S2-020 cell lines were generated 
using the TMPRSS3 cDNA cloned in sense and antisense orientation 
into the pH)3-Aprl-neo vector. Several clones were generated show- 
ing variable degrees of TMPRSS3 sense/antisense mRNA transcrip- 
tion. Two sense and two antisense clones were further characterized 
by s.c. injections in nude mice, in vitro Matrigel invasion assays, and 
biochemically for their capacity to hydrolyze substrates for trypsin 
and elastase. No significant differences could be observed between 
sense and antisense clones in any of the functional assays. There was 
no difference in tumor size and local invasiveness after s.c. injections, 
and there was no evidence of metastasis formation after tail vein 
injection with both sense and antisense cells. Similarly, we failed to 
show an effect on in vitro invasiveness and on proteolytic activity of 
native and enterokinase-treated lysates for a selection of serine pro- 
tease substrates. Many factors may be responsible for the failure of 
rA'/P/?4S'5'i-transfected tumor cells to behave differently in these assay, 
including the necessity for a complex activation mechanism, pro- 
cesses that affect protein folding, or the absence of essential cofactors. 
Furthermore, although transiently transfected HEK-293 cells showed 
expression of the V5-tagged recombinant TMPRSS3 protein, we 
could not directly demonstrate expression of the protein in the trans- 
fected cells because v^e lacked a specific antibody. In the absence of 
final experimental proof, we can therefore only hypothesize, based on 
the structural characteristics and the expression pattern in cancer 
tissues and in the SUIT-2 subclones, that this new protease has a 
potential role for tumor progression, metastasis formation, and tumor 
invasion. 

Proteases have an important function in the context of tumor 
growth, because they can break down the surrounding extracellular 
matrix components, they can pave the way for spreading tumor cells, 
and they can release and activate growth and angiogenic factors. 
Protease activity on the surface of tumor cells is required to allow 
malignant invasion through surrounding connective tissue, which is an 
important event in the multistep process of metastasis formation (19). 
Thus, it is conceivable that TMPRSS3 may contribute to the invasive 
and metastatic potential of tumor cells. In this context, cell surface 
proteases such as TMPRSS3 may fiinction as an activator of other 
extracellular proteases or act directly by degrading the extracellular 
matrix surrounding the tumor cells. Furthermore, TMPRSS3, as 
shown for many other proteases, may participate in the activation of 
hormones or growth factors by proteolytic cleavage of inactive pro- 
forms. Because the biochemical events required for the activation of 



this novel serine protease are unknown and the specific substrates 
have not yet been identified, the precise role of TMPRSS3 in carci- 
nogenesis remains to be elucidated. 

Acknowledgments 

We thank G. Adler for continual support, U. Lacher for excellent technical 
assistance, M. A. Hollingsworth for the pH/3-Aprl-neo vector, and F. Gan- 
sauge and G. Varga for providing human pancreatic tissue samples. 

References 

1. DeClerck, Y. A., and Imrcn, S. Protease inhibitors: role and potential therapeutic use 
in human cancer. Eur. J. Cancer, 30A: 2170-2180, 1994. 

2. Rawlings. N. D.. and Barrett, A. J. Families of serine peptidases. Methods Enzymol., 
244: 19-61, 1994. 

3. Gress. T. M„ Wallrapp, C, Frohme. M., Muller-Pillasch, F., Lacher, U.. Friess, M., 
Buchlcr, M., Adicr, C, and HohciscI, J. D. Identiftcation of genes with specific 
expression in pancreatic cancer by cDNA representational diflcrcnce analysis. Genes 
Chromosomes Cancer. /9: 97-103. 1997. 

4. Taniguchi, S., Iwaniura, T.. and Katsuki, T. Correlation between spontaneous meta- 
static potential and type I collagcnolyiic aciivity in a pancreatic cancer cell line 
(SUIT-2) and sublines. Clin. Exp. Metastasis, iO: 259-266, 1992. 

5. Gunning, P., Leavitt, J., Muscat, G., Ng, S. Y., and Kedes, L. A human ^-actin 
expression vector system directs high-level accumulation of antisense transcripts. 
Proc. Natl. Acad. Sci. USA, 84: 4831-4835, 1987. 

6. Lutz, M. P., Pinon, D. I., Gates, L. K., Shcnolikar, S., and Miller, L. J. Control of 
cholecystokinin receptor dcphosphorylation in pancreatic acinar cells. J. Biol. Chcm., 
268: 12 136-12142. 1993. 

7. Elbein, A. D. Inhibitors of the biosynthesis and processing of N-llnkcd oligosaccha- 
rides. CRC Crit. Rev. Biochem., 16: 21-49, 1984. 

8. Dasgupta, A,. Takahashi. K.. Cutler, M., and Tanabc. K. K. O-Linkcd glycosylation 
modifies CD44 adhesion to hyaluronatc in colon carcinoma cells. Biochem. Biophys. 
Res. Commun.. 227: 110-117, 1996. 

9. Krugcr, B., Lcrch, M. M., and Tcsscnow. W. Direct detection of premature protease 
activation in living pancreatic acinar cells. Lab. Inveslig., 78: 763-764. 199S, 

10. Harimann. E., Rapoport, T. A., and Lodish. H. F. Predicting the orientation of 
cukaryotic membrane-spanning proteins. Proc. Natl. Acad. Sci. USA. 86: 5786-5790, 
1989. 

11. Leylus. S. P., Locb, K. R.. Hagen. F. S., Kurachi. K.. and Davie. E. W. A novel 
trypsin-like serine protease (hepstn) with a putative transmembrane domain expressed 
by human liver and hepatoma cells. Biochemistry, 27: 1067-1074, 1988. 

12. Paoloni Giacobino, A., Chen, H., Pcitsch, M. C, Rossier, C, and Anlonarakis, S. E. 
Cloning of the TMPRSS2 gene, which encodes a novel serine protease with trans- 
membrane, LDLRA, and SRCR domains and maps to 21q22.3. Genomics, 44: 
309-320, 1997, 

13. Slcitz, T. A., Henderson, R., and Blow, D. M. Structure of crystalline a-chymotryp- 
sin. 3. Crystal lographic studies of substrates and inhibitors bound to the active site of 
a-chymotrypsin. J. Mol. Biol., 46: 337-348, 1969. 

14. Anisowicz, A., Sotiropoulou. G., Steninan, G., Mok, S. C. and Sager, R. A novel 
protease homolog differentially expressed in breast and ovarian cancer. J. Mol. Med., 
2: 624-636, 1996. 

15. Lin. B., Ferguson, C, White. J. T., Wang. S., VcsscMa, R.. True. L. D.. Hood, L., and 
Nelson. P. S. Prostate-localized and androgen-regulated expression of the membrane- 
bound serine protease TMPRSS2. Cancer Res., 59: 4180-4184. 1999. 

16. Tanimoto, H., Yan. Y., Clarke, .1., Korourian. S., Shigcmasa, K., Pannley, T. H., 
Purham, G. P., and O'Brien, T. J. Hepsin, a cell surface serine protease identified in 
hepatoma cells, is overexpresscd in ovarian cancer. Cancer Res., 57: 2884-2887, 
1997. 

17. Torres Rosado, A., O'Shca, K. S., Tsuji, A.. Chou, S. H.. and Karachi, K. Hepsin, a 
putative cell-surface serine protease, is required for mammalian cell growth. Proc. 
Nail. Acad. Sci, USA, 90: 7181-7185, 1993. 

18. Kazama, Y., Hamamoto, T., Foster, D. C, and Kisiel, W. Hepsin, a putative 
membrane-associated serine protease, activates human factor VII and initiates a 
pathway of blood coagulation on the cell surface leading to thrombin formation. 
J. Biol. Chem., 270: 66-12, 1995. 

19. Chen, W. T., Olden, K., Bernard, B. A., and Chu, F. F. Expression of transformation- 
associated protease(s) that degrade fibronectin at cell contact sites. J. Celt Biol., 98: 
1546-1555, 1984. 

20. Kyte. J., and Doolittle. R. F. A simple method for displaying the hydropathic 
character of a protein. J. Mol. Biol., i57: 105-132, 1982. 



2606 



Exhibit 40 



Ijn Annual Reviews 

Ji www, annua Ire vie ws.org/aronline 



Ann. Rev, Cell Biol. 1986. 2 : 499-516 

Copyright © 1986 by Annual Reviews Inc. All rights reserved 



£? 
9 

> c 

2 o 

if 

U3 o 

3 ^ 
o 



MECHANISM OF PROTEIN 
TRANSLOCATION ACROSS 
THE ENDOPLASMIC 
RETICULUM MEMBRANE 



Peter Walter and Vishwanath R. Lingappa 

:^ 5 Department of Biochemistry and Biophysics and Departments of 

•§ - Physiology and Medicine, University of California Medical School, 

c % San Francisco, California 94143 

I ^ 

Dq 

so C 

_M ctj 

<^ ^ 

5 I CONTENTS 

^ *= INTRODUCTION 499 

S O 

o o 



HISTORICAL BACKGROUND 500 



3 ^ MECHANISM OF TARGETING 501 

== g Signal Recognition Particle 502 

Q Signal Sequences 507 

> ^ SRP Receptor 508 

-o MECHANISM OF TRANSLOCATION 509 

c Machinery 509 

< Translocation Substrates 510 

Posttranslational Translocation in Yeast 51 J 

Postlranslational Translocation of Genetically Engineered Substrates 5 1 2 

CONCEPTS AND CONTROVERSIES 512 



INTRODUCTION 

In this review we attempt a timely survey of issues concerning protein 
translocation across the membrane of the endoplasmic reticulum of eu- 
karyotic cells. We focus on recent developments, open questions and current 
controversies. Due to limited space, this review cannot be and is not 



0743^634/86/1 1 1 5-0499$02.00 



499 



i 



Annual Reviews 

www.annualreviews.org/aronlinc 



500 WALTER & LINGAPPA 

intended to be comprehensive. Where appropriate, reference to more 
detailed reviews is given in the text. 

Eukaryotic cells contain a multiplicity of membrane-delimited com- 
partments. The selective localization of particular proteins provides the 
basis for each of these compartments to serve various specialized functions. 
Thus, for example, the mitochondrion is the exclusive residence of enzymes 
involved in oxidative phosphorylation ; similarly, oxidative detoxification 
takes place exclusively in the endoplasmic reticulum (ER). The proteins 
that compose, and are contained within, particular membrane systems are 
q kept there by the impermeability of the lipid bilayer to diffusion of proteins 

across membranes. How then is compartmentalization of newly syn- 
I o thesized proteins achieved, in view of the fact that the cytosol is the 

I i common site of synthesis for the majority of proteins, though they are 

I I destined for distinct subcellular locations? The term intracellular protein 
I g topogenesis has been coined (Blobel 1980) to describe the specialized 
o ^ mechanisms by which newly synthesized proteins selectively overcome the 

permeability barrier of specific intracellular membranes to achieve their 
correct subcellular localization. This review addresses the question of 
how proteins that pass through or reside in the intracistemal space are 
3 o specifically synthesized on membrane-bound ribosomes and translocated 

1 1* into the ER lumen. 

2^ As in the study of other protein translocation events (e.g. across mito- 

^ ^ chondrial membranes) there are two fundamental issues to resolve regard- 

§ I ing transport across the ER membrane : (a) How is the target membrane 

recognized and distinguished from all other membrane systems? (6) Once 
it has been targeted, how is the polypeptide chain translocated across the 
lipid bilayer into the lumen of the organelle? 



13 u 



-3 1 HISTORICAL BACKGROUND 

3 The work of Palade and coworkers on the secretory pathway (reviewed 

< by Palade 1975) focused attention on ribosomes bound to the rough 

endoplasmic reticulum as the site of synthesis of secretory proteins. The 
subsequent demonstration of vectorial discharge of puromycin- released 
polypeptides into the lumen of isolated rough microsomal vesicles 
(Redman & Sabatini 1966) suggested that a specialized mechanism was 
responsible for translocation across the ER membrane: Nascent poly- 
peptides emerged into the lumen of the microsomal vesicles concomitant 
with their synthesis. These results raised the intriguing question of how 
the cell could distinguish the mRNAs for secretory proteins from those 
for cytoplasmic or mitochondrial proteins and selectively translate the 
former on ER-bound ribosomes. 



Annual Reviews 

www.annualreviews.org/aronline 



PROTEHSI TRANSLOCATION ACROSS THE ER 501 

The signal hypothesis (Blobel & Dobberstein 1975) was proposed to 
account for these phenomena. Over the last 1 5 years overwhelming evi- 
dence has accumulated from a plethora of experimental systems in favor 
of this model. As it specifically relates to secretory proteins, the essential 
tenets of an updated version of this hypothesis (for a recent review see 
Walter et al 1984) are that: (a) the information for localization of newly 
synthesized proteins into the lumen of the ER is encoded in a discrete 
segment of the nascent polypeptide, the signal sequence ; (b) this signal 
sequence interacts with a series of receptors, some of them cytoplasmic, 
others integral to the ER membrane. Some of these receptors function in 
% . targeting the chain to the ER membrane, others function in its actual 

§ 1 translocation across that membrane. These latter receptors, together with 

§ I associated proteins in the ER membrane, constitute the "translocon," a 

^ 1 postulated engine able to drive signal sequence-bearing chains across the 

I ER membrane through a proteinaceous pore or channel, 

g More recently, the concepts of the signal hypothesis have been expanded 

^ to describe a general framework for intracellular protein topogenesis (Blo- 

J § bel 1980). According to this model, "topogenic sequences" within discrete 

"Sg segments of targeted proteins are decoded by specific receptors, either 

§ g during (cotranslational) or shortly after (posttranslational) their biosyn- 

1 S) thesis. The specificity of such signal sequence-receptor interactions targets 
^ Q the proteins to the correct intracellular membranes where they are fed into 

translocons that move them across the hydrophobic core of the Upid 

§ 2 bilayer. Similarly, it has been proposed that another class of topogenic 

2 J sequences — termed stop- transfer sequences — interacts with the translocon 
§3 to arrest further transport and thereby achieve an asymmetric trans- 
;3 o membrane orientation of integral membrane proteins. Thus many of the 
^ concepts developed in this review for soluble ectoplasmic proteins are 
3-^ directly applicable to the problem of integration of transmembrane 
i ^ proteins. Recent developments reviewed below suggest that translocons in 
-i different intracellular membrane systems may function more similarly than 
J previously thought. 



MECHANISM OF TARGETING 



With the availability of in vitro systems that faithfully reproduce the 
translocation of nascent proteins [secretory proteins (Blobel & Dobb- 
erstein 1975), lysosomal proteins (Erickson et al 1983), and certain classes 
of integral membrane proteins (Katz et al 1977)], it became feasible to 
investigate the molecular requirements for protein translocation across the 
ER membrane. So far, two components, the signal recognition particle 



Annual Reviews 

www.annualreviews.org/aronline 



502 WALTER & LINGAPPA 

(SRP) and the SRP receptor, have been purified and shown to function in 
the targeting events preceding the actual translocation event. 

Signal Recognition Particle 

SRP is an 11 S small cytoplasmic ribonucleoprotein (Walter & Blobel 
1982). In our current view, SRP functions as an adapter between the 
protein synthetic machinery in the cytoplasm and the protein translocation 
machinery in the ER membrane. 



« STRUCTURE OF SRP SRP was first recognized by its ability to restore the 

^•l" translocation activity of salt-extracted microsomes in vitro (Warren & 

o Dobberstein 1978). It was purified to homogeneity from a salt extract of 

^ I (Walter & Blobel 1 980). SRP consists of a small (300 nucleotide) 7SL RNA 

I o, (Walter & Blobel 1982) and six nonidentical polypeptide chains organized 

g*i£ into four SRP proteins. These proteins are two monomers, a 19-kDa 



canine pancreatic microsomal vesicles using this activity as an assay 



i§ polypeptide and a 54-kDa polypeptide, and two heterodimers, one com- 

posed of a 9-k:Da and a 14-kDa polypeptide, and the other comprised of 
1^ a 68-kDa and a 72-kDa polypeptide (Siegel & Walter 1985). When SRP 

Co is disassembled under nondenaturing conditions, the RNA and the protein 

fractions are inactive by themselves, but together they can readily be 
vo § reconstituted into an active particle (Walter & Blobel 1983; Siegel & 

S ; Walter 1985). 

I Recent studies revealed that different assayable functions of SRP in the 

^[S targeting process can be assigned to specific structural domains of the 

particle. These separable functions include the recognition of signal 
g sequences and the ability of SRP to arrest specifically the translation of 

= 1 nascent signal sequence-bearing proteins (Siegel & Walter 1986b). These 

^ c domains are schematically indicated in Figure 1 superimposed on the 

(i secondary structure of 7SL RNA. This model is supported by recent 

g evidence demonstrating that SRP is a rod-shaped, elongated structure 

< (Andrews et al 1985) and that the RNAs — visualized directly by electron 

spectroscopic imaging — span the entire length of the particle (D. W. 

Andrews et al, submitted for publication). 

SIGNAL RECOGNITION Oncc SRP had been purified to homogeneity it 
became possible to study its activity in greater detail. Results of exper- 
iments testing both the effects of SRP on the translation of secretory 
proteins and its binding properties with various components in the trans- 
lation-translocation system have led to the model of the SRP cycle shown 
in Figure 2. 

In brief, SRP is thought to bind in a signal-sequence-independent 



Annual Reviews 

www.annualreviews.org/aronline 



PROTEIN TRANSLOCATION ACROSS THE ER 503 

manner with relatively low affinity to biosynthetically inactive ribosomes 
(Figure 2a, b) (Walter et al 1981). Upon emergence of a signal sequence as 
part of the nascent polypeptide chain, the affinity of SRP for the ribosome 
increases (Figure 2c); in the case of preprolactin synthesized on wheat 
germ ribosomes this increase amounts to three to four orders of magnitude. 
The SRF-ribosome-nascent chain complex is then targeted to the mem- 
brane of the ER via a direct interaction of SRP with the SRP receptor 
(Walter & Blobel 1981b), an integral membrane protein that is restricted 
in its subcellular localization to this membrane system (Hortsch et al 1985). 
At this point SRP and the SRP receptor detach from the ribosome and 
g . can reenter the cycle, i.e. both molecules are thought to act catalytically 

g 1 in the targeting process. The ribosome-nascent chain complex engages in 

■| I a functional ribosome membrane junction, and the translocation of the 

1 1 nascent polypeptide proceeds (see below). (For a more detailed description 

g of the SRP cycle see Walter et al 1984.) 

is ^ ELONGATION ARREST When SRP is included in in vitro translation systems 

^§ in the absence of microsomal membranes, it blocks protein synthesis 

"S § concomitant with the increase in its affinity for the ribosome just after 

J g the signal peptide becomes exposed outside the large ribosomal subunit 

1 S) (Walter & Blobel 1981b; Meyer et al 1982a). In some cases a discretely 

sized protein' fragment that corresponds to the elongation-arrested 
^ J secretory protein can be detected by gel electrophoresis ; in other cases the 

§ .2 arrested forms appear as a broader smear on gels, which indicates that 

2 J SRP can recognize signal sequences and arrest elongation within a certain 

oo ^ 



GO 



2: ;3 range of chain lengths. It is also observed that some nascent polypeptides 

;§ o are arrested, while others transiently pause in chain growth (P. Walter, 

^ unpublished results). Therefore, in these latter cases arrest is often difficult 

to detect (Meyer 1985). Interestingly, while elongation arrest has been 
t ^ demonstrated as a kinetic delay of elongation in translation systems recon- 

stituted from mammalian components (K. Matlack & P. Walter, unpub- 
^ lished results), the same eflFect is more pronounced (as a strict blockage of 

elongation) when signal-bearing proteins are translated in a heterologous 
wheat germ system. Thus while the general phenomenon of arrested elong- 
ation is ubiquitous, different in vitro systems reflect it to a different degree. 
Therefore it remains to be established whether SRP acts in vivo as a strict 
"on-off' switch or functions as a more graded rate-controlling factor. 

Two distinct biochemical approaches were employed to map the elon- 
gation-arrest function to a separate and separable domain of SRP. One 
functional domain was shown to consist of the 9/14-kDa SRP proteins and 
those 7SL RNA sequences that are homologous to repetitive Alu DNA 
(see Figure 1, left). One experimental approach employed single omission 




Annual Reviews 

www.annualreviews.org/aronline 



504 WALTER & LINGAPPA 

experiments in which SRPs were reconstituted from fractionated and 
purified protein and RNA components (Siegel & Walter 1985). A second 
approach involved the preparation of a subparticle obtained after nucle- 
olytic dissection of SRP (Siegel & Walter 1986). These perturbed SRPs 
lacking the elongation-arrest domain are still active in signal recognition 
and targeting; therefore, elongation arrest cannot be a prerequisite foi 
protein translocation across the membrane. In the absence of elongation 
arrest, however, most signal-bearing nascent proteins lose their ability tc 



o 




(b) 



A I Annual Reviews 

R www.annualreviews.org/aronIine 

PROTEIN TRANSLOCATION ACROSS THE ER 505 

be translocated if elongation proceeds beyond a critical point in the absence 
of membranes. Thus elongation arrest seems to maintain the nascent chain 
in a translocation-competent state by preventing (or delaying) its further 
elongation into the cytoplasmic space and thereby adds to the fidelity of the 
reaction. The particular length range in which a nascent protein remains 
translocation competent may vary for different proteins (see below). 

Since SRP contains an RNA as a structural component, it is tempting 
to speculate that this RNA engages in base-pairing interactions with other 
nucleic acids during the SRP*s functional cycle. The RNA components in 
op the translational apparatus are likely candidates for participants in such 

S interactions (Walter & Blobel 1982; Zwieb 1985). However, there is at 

" ^ present no direct evidence for such interactions. A possible mechanism for 

elongation arrest could involve the binding of 7SL RNA to the A-site on 
the ribosome, thus preventing the next amino acyl tRNA from binding. 
I I Indeed, the secondary structure of 7SL RNA in the elongation-arrest 



> c 



o 



o o 



I g Figure J Domain structure of SRP (left) and the SRP receptor (right), (a) (From Siegel & 



^ Walter 1986a): SRP is composed of two separable domains. A possible phylogenetically 

conserved secondary structure for 7SL RNA is shown (Siegel & Walter 1986a). Similar 
M o secondary structures have been proposed by Gundelfinger et al (1984), E. UUu (personal 

> S) communication), and Zwieb (1985). Connecting lines between the RNA strands indicate 

Q i5 base pairs ; G-U pairs are included. (For an extensive description of SRP structure see Siegel 

^ g & Walter 1986b.) Micrococcal nuclease cleaves the particle at the point indicated by arrows, 

5! • removing the elongation-arresting domain. Additional cuts mapped by Gundelfinger et al 

^ I (1983) are indicated by arrowheads. The elongation-arresting domain includes both ends of 

2 the RNA (labeled 5' and 30 and is comprised of sequences that are homologous to the 

^ 3 repetitive Alu DNA sequence family. Evolutionary considerations suggest that 7SL RNA is 

^ the parent molecule for repetitive Alu DNA (Ullu & Tschudi 1985). The thin dashed lines 

5 .-fc" indicate the boundaries of homology between 7SL RNA and an Alu consensus sequence. 

^ S The elongation-arresting domain also contains the 9/14-kDa SRP protein. The other domain, 

V 'c termed SRP(S), retains signal recognition and translocation promoting function and is 

^ ^ comprised of the middle portion of 7SL RNA (the S-segment) and the remaining three SRP 

^ proteins. As mentioned in the text, the 54-kDa SRP protein can be selectively cross-linked 

c to signal peptides and may therefore provide the signal binding pocket, {b) (From Lauffer 

^ et al 1985): A model of the disposition of the SRP receptor a-subunit in the membrane of 

the ER is shown. Putative structural and functional features as deduced from the primary 
sequence (Lauffer et al 1985) are indicated. Regions I and II are putative membrane-spanning 
regions; whether both of them or either one alone functions as the membrane anchor of the 
receptor or if additional hydrophobic regions are contributed by the p-subun'it is presently 
not known. Regions III-V contain the charge clusters described in the text. The boxed 
domain contains regions strongly resembling RNA binding proteins ; their presence suggests 
that the SRP-SRP receptor interaction may include binding of 7SL RNA to this domain. 
The arrow indicates the position of the protease-sensitive site. Cleavage of the receptor at 
this position results in the release of the 52-kDa cytoplasmic fragment. This fragment does 
not have two properties of the intact receptor : the binding affinity for SRP and the ability 
to release elongation arrest (Lauffer et al 1985 ; Gilmore et al 1982a). 



5 



Annual Reviews 

www.annualreviews.org/aronline 



506 



WALTER & LINGAPPA 



q 

> c 

S P3 

M O 



5^^ 



^ — 

•a ^ 
o o 
-o — ' 

° o 
o o 

^ : 
^ e 

- i2 
-o " 

d 
c 

< 




Annual Reviews 

www.annualreviews.org/aronline 



PROTEIN TRANSLOCATION ACROSS THE ER 507 

domain of SRP resembles that of a tRNA that is missing the anticodon 
stem. In addition, the physical dimensions of SRP would easily allow the 
particle to bridge the distance between the nascent chain exit site on the 
ribosome (where the signal sequence emerges) and the peptidyl transferase 
activity known to be located between the two ribosomal subunits (Andrews 
et al 1985). 

Signal Sequences 

What constitutes the essential features of a signal sequence and how 
such sequences are recognized by SRP remain unsolved problems. Signal 
sequences show no recognizable primary sequence homology, and a recent 
I compilation shows that sequence variation can be rather extreme (von 

o Heijne 1985). Yet studies on a variety of systems both in vivo and in vitro 

demonstrate conservation of signal sequence function over the widest 



o 



> c 
o 



1 ^ 

S § evolutionary distances (MuUer et al 1982). As a consequence we are still 

E i, not able to predict with confidence which regions in proteins might function 

o as internal signal sequences. Nevertheless, internal signal sequences have 

s ^ been demonstrated unequivocally (Bos et al 1984). Moreover, cleavage by 

•o - One of the few characteristic features of signal sequences is a variable 

° § 

c o 

> top 



o 



signal peptidase is not required for translocation (Palmiter et al 1978). 



stretch of hydrophobic amino acids in the core of the sequence. Point 
o mutations in the hydrophobic core in bacterial signal sequences have been 

so g shown to aboHsh function (Lee & Beckwith 1986, this volume). Based on 

S;*^ the hydrophobicity of these regions and on evidence from biophysical 

? I studies with synthetic signal peptides (reviewed by Briggs & Gierasch 

1986), it has been suggested that these sequences act as amphiphiles that 
are integrated into and possibly perturb lipid bilayers. There is, however, 
J ^ still no evidence that the general mechanism for translocation involves a 

direct interaction of signal sequences with the hydrophobic core of the 
c lipid bilayer. Indeed, several lines of evidence suggest direct interactions 

(S >^ of signal sequences with proteins. 

c The clearest evidence for such interactions involve SRP, Since SRP is a 

< soluble ribonucleoprotein, its interactions with signal sequences can be 

studied in the absence of membranes by measuring binding or by observing 
the SRP-mediated modulation of protein synthesis. For example, when 
signal sequences that are rich in leucine are translated in the presence 
of the amino acid analog ^-hydroxy-leucine, SRP signal recognition is 
abolished (Walter et all981 ; Walter & Blobel 1981b). This demonstrates 
that SRP directly recognizes features in the nascent chain. Moreover, the 
finding conclusively rules out the possibility that sequences in the mRNA 
alone are responsible for the observed effect. (After the discovery of an 
RNA component in SRP the latter notion was considered attractive 



i 



Annual Reviews 

www.annualreviews.org/aronline 

508 WALTER & LINGAPPA 

because of the possibility of recognition via putative base-pairing inter- 
actions.) Direct proof of an SRP-signal sequence interaction was recently 
provided by cross-linking experiments. Two groups independently showed 
that a photoactivable cross-linking reagent was selectively incorporated 
into the amino-terminal region of the signal peptide for nascent prepro- 
laclin. Each group found that the signal peptide is in direct contact with 
the 54-kDa SRP protein (Kurzchalia et al 1986 ; Krieg et al 1986). 

SRP Receptor 

Using the same in vitro protein translocation assays that led to the puri- 
fication of SRP, two distinct approaches were taken to identify the cor- 
responding membrane components involved in targeting of signal 
I g sequence-bearing nascent chains to the ER membrane. These approaches 

eventually led to the discovery and purification of the SRP receptor, the 
first membrane protein proven to play a vital role in this process. 



S3 « 

S « 
tA o 

E One of these approaches was based on the early observation that pro- 



teolysis of microsomal membranes completely abolishes their protein 
eg translocation activity but that, most importantly, the activity can be 

restored by addition to an extract prepared by limited proteolysis of the 
original microsomal membrane fraction (Walter et al 1979; Meyer & 
Dobberstein 1980a). This proteolytic dissection and functional recon- 
J* stitution provided the assay for the purification of the protease-solubilized 

^ g component. The activity was purified as a basic 52-kDa protein (apparent 

mobility on SDS PAGE is 60 kDa) (Meyer & Dobberstein 1980b), which 



•Bo 



^ E was subsequently demonstrated (by immunological techniques) to be a 

On " 
2 CJ 



proteolytic fragment derived from a 69-kDa integral membrane protein 
(apparent mobility 72 kDa) restricted in its subcellular localization to the 
S ^ endoplasmic reticulum (Meyer et al 1982b). 

= o The second approach took advantage of the observations that, when 

^'M. assayed in the absence of microsomal membranes, SRP causes a site- 

^ specific elongation arrest in the synthesis of presecretory proteins and that 

I microsomal membranes contain an activity that releases the elongation 

^ arrest. Based on these observations, the elongation-arrest-releasing activity 

was predicted to reside in a membrane protein termed the SRP receptor 
(Walter & Blobel 1 98 1 b) [subsequently named the docking protein (Meyer 
et al 1982a)]. Fractionation of a detergent extract of microsomal mem- 
branes employing affinity chromatography on SRP-Sepharose as a key step 
allowed purification of the SRP receptor. The purified fraction contained a 
predominant 69-kDa membrane protein and the arrest-releasing activity. 
Using both immunological and peptide-mapping techniques, the SRP 
receptor was shown to be identical to the membrane protein identified via 
the proteolytic dissection methods described above (Gilmore et al 1982a,b). 



Annual Reviews 

www.annuaIreviews.org/aronline 



CO 



PROTEIN TRANSLOCATION ACROSS THE ER 509 

Recently, the primary structure of the 69-kDa SRP receptor protein was 
determined from its cognate cloned cDNA, and its relationship to the 
cytoplasmic SRP receptor fragment was determined (Lauffer et al 1985). 
This fragment was shown to begin with residue 152 of the intact protein. 
Thus, it is sequences within the 151 amino acids at the amino terminal that 
anchor the SRP receptor in the lipid bilayer. Two distinctly hydrophobic 
regions have been identified that constitute putative a-hehcal trans- 
membrane segments. Since either of these segments would position a 
positively charged amino acid in the hydrophobic core of the lipid bilayer, 
the receptor probably interacts with other integral membrane proteins that 

1 . neutralize these charges. Recent evidence suggests the existence of proteins 
1 1 that can be copurified with the 69-kDa SRP receptor protein or isolated 
II by affinity techniques. In particular, an ER membrane protein with an 
^ g apparent molecular weight of 30 kDa was found by a variety of techniques 
■| I to be tightly associated with the 69-kDa protein (Tajima et al 1986). Thus 
g the SRP receptor appears to be a hetcro-dimeric protein that in addition 

to the 69-kDa polyi>eptide (the SRP receptor a-subunit) contains a second 
30-kDa subunit ()?-subunit). Carboxy-terminal to the putative trans- 

;S § membrane regions in the a-subunit is an unusually hydrophilic domain. 

o § In particular, unusually large clusters of charged amino acids are found 

1 1) surrounding the site of proteolytic cleavage that severs the 52-kDa cyto- 

plasmic domain (see Figure 1, right). This domain of the SRP receptor 
^ strongly resembles nucleic acid binding proteins, which suggests that the 

§ 2 receptor may transiently interact directly with the 7SL RNA in SRP and 

2 that the SRP-SRP receptor affinity could be mediated, at least in part, by 
§5 a protein-nucleic acid interaction. 

;§ The SRP receptor is unlikely to be part of the translocon itself, because 

5 g the receptor is present in the ER membrane in substoichiometric amounts 

cj I with respect to membrane-bound ribosomes. Thus it was suggested that 

the SRP receptor functions "catalytically" and is recycled once correct 
targeting of the ribosome has been achieved (Gilmore & Blobel 1983). 
< There is also evidence for an additional activity that is distinct from SRP 

and the SRP receptor and may interact with the targeted signal sequence 
and act as a secondary signal receptor(s) in the ER membrane (Gilmore 
& Blobel 1985 ; Prehn et al 1980). However, a protein serving this function 
has not yet been identified. 

MECHANISM OF TRANSLOCATION 

Machinery 

Cell-free systems provided a detailed molecular description of the targeting 
machinery, but have yet to allow insights into the molecular details of the 



Annual Reviews 

www.annuaIreviews.org/aronline 



tao 



510 WALTER & LINGAPPA 

translocation process. In part this difficulty results from the apparent 
obligate coupling of translocation and translation : Transport across the 
ER membrane takes place cotranslationally ; completed precursors are not 
detectable in vivo in the cytoplasm. In cell-free systems translocation 
proceeds only during a limited time and under the fastidious conditions 
required for the synthesis of the very molecule whose translocation is being 
studied. As a result, although several specific polypeptides have been 
implicated as functional components of the translocon, the direct role of 
any of these proteins remains to be demonstrated. For example, two 
integral membrane proteins, termed ribophorins, have been suggested to 
g . act as ribosome receptors (Kreibich et al 1978) ; the recent purification of 
> 1 signal peptidase, a relatively abundant complex of six polypeptides, sug- 

I I gests that these proteins are involved in other functions besides signal 

I I cleavage (Evans et al 1 986). 

Translocation Substrates 

^§ Although we know little about the actual machinery involved, insight 
o into certain aspects of the mechanism of translocation has recently been 

0 g obtained by approaches involving manipulation of the translocation sub- 

1 % strates. For example, expression of engineered cDNAs encoding fusion 
9 ^ proteins in transcription-linked translation systems demonstrated that a 
^ J signal sequence was sufficient to direct translocation of normally cyto- 
§ 2 plasmic globin, both in vitro (Lingappa et al 1984) and in vivo (K. Simon 

2 J et al, submitted for publication). Thus, the specific information for trans- 
§ Q location was contained within the signal sequence and not the **passenger" 
;5o protein. 

5 A more complex version of these experiments raised interesting ques- 

u .> tions as to the mechanism of translocation (Perara & Lingappa 1985). The 
o ^ DNA sequence coding for globin, normally a cytosolic protein, was fused 

3 with the 5' end of the DNA sequence for preprolactin, a secretory protein 
^ that has an amino-terminal signal sequence. This fusion protein thus 

contained the preprolactin signal sequence at an internal position, 1 1 7 
amino acids from the initiator methionine. When expressed in a tran- 
scription-linked translation system, this internal signal sequence was not 
only cleaved by signal peptidase, but directed the translocation of both 
flanking protein domains. Surprisingly, carbonate extraction demon- 
strated that neither the globin domain with the signal sequence attached 
at its carboxy terminus nor the prolactin domain were integrated into the 
membrane. Instead, both resided in the vesicle lumen either free or bound 
to proteins. This result suggests that signal sequences are not buried in the 
bilayer directly but perform their function by interacting with a protein- 



Annual Reviews 

www.annualrcviews.org/aronline 



PROTEIN TRANSLOCATION ACROSS THE ER 511 

aceous machinery in the membrane. Moreover, translocation of the 
globin domain by a subsequently emerging signal sequence suggests that 
the energy used for the globin domain's synthesis is not required for its 
translocation. Thus the commonly observed coupling of translocation and 
translation may not be an obligate requirement for transport across the 
ER membrane. 

The notion that the translocation machinery can function independently 
of protein synthesis has now received direct support from different experi- 
mental systems. 

DO 

q 

•t^ Posttranslational Translocation in Yeast 

3 I Recently, in vitro translation-translocation systems from the yeast Sac- 

ll charomyces cerevisiae have been established (Hansen et al 1986; Waters 

I § & Blobel 1986; Rothblatt & Meyer 1986). The precursor to the yeast 

g pheromone a-factor has been used as a model secretory protein. Contrary 
to all expectations, this precursor, an ^ 18.5 kDa protein, is translocated 

^ § across yeast ER membranes posttranslationally, i.e. after it has been com- 

"S § pletely synthesized and has been released from ribosomes. Prepro-a-factor 

g g has no particularly hydrophobic or amphipathic stretches in its primary 

1 a sequence (other than a typical signal sequence), making it unlikely that its 
S posttranslational translocation is due to some passive partitioning of the 

^ J protein across the lipid bilayer. Furthermore, the posttranslational trans- 
it .2 location reaction is ATP-dependent and requires protein elements both in 

2 J the membrane and the soluble fraction. Whether these protein components 
g Q are related in any way to the putative yeast SRP and SRP receptor analogs 

remains to be established by biochemical analysis. It is clear from these 

^ data, however, that translocation of prepro-a-factor does not require 

g > coupling to protein synthesis. Therefore, the translocon can, in principle, 

>D accept its substrate posttranslationally and in the absence of the ribo- 

c 



some. 

c It should be kept in mind that the posttranslational translocation of 

prepro-a-factor was observed in vitro in a system artificially depleted of 
ER membranes during synthesis. This finding does not prove that prepro- 
a-factor ever crosses the ER membrane posttranslationally in vivo, where 
ER membranes are always present during translation. Rather, the actual 
degree of coupling of translocation and protein synthesis will depend 
on the relative rates of the respective processes. If targeting and trans- 
location are fast with respect to protein elongation, a strictly vectorial 
cotranslational translocation mode will result, as appears to be the 
rule in mammalian cells in vivo (Bergman & Kuehl 1979; Glabe et al 
1980). 



Annual Reviews 

www.annualreviews.org/aronline 



512 WALTER & UNGAPPA 

Posttranslational Translocation of Genetically 
Engineered Substrates 

Similar findings also emerged from the use of engineered clones in mam- 
malian cell-free translation systems (Perara et al 1986 ; Mueckler & Lodish 
1986). Using a procedure that generates a truncated mRNA lacking a 
termination codon, secretory polypeptide chains could be synthesized and 
presented to membranes in the absence of further chain elongation while 
still held by the ribosome that effects their synthesis. It was demonstrated 
that such chains could be translocated and that nucleotide triphosphates 
were required as the energy source for this process. In contrast to the 
situation in the yeast system described above, in most of these cases 
H 5 translocation could be abolished by releasing the nascent chain from 

I = the ribosome by artificial termination with the amino acyl tRNA analog 

g puromycin. As expected, translocation was abolished by deletion of the 



I ^ codmg region for the signal sequence. In some cases, however, it was also 

|. o found that some short chains could translocate in a ribosome-independent 

I ^ condition analogous to that found for prepro-a-factor in the yeast system 

^ (E. Perara & V. R. Lingappa, submitted for publication). Thus it appears 

•o ^ that, at least for the proteins investigated, polypeptide chain growth pro- 

•i % ceeds through stages in which translocation competence is a property of 

O H the chain itself or is maintained by interaction with the ribosome (see 



Q5 

vo c Figure 3). 

5! These results show cotranslational translocation in a new light : The role 

? 1 of the membrane-bound ribosome is not to extrude or push the chain 

vd ^ 
2- u 



through the bilayer as suggested by some observers (Wickner & Lodish 
1985). Rather, translocation is catalyzed by an energy-consuming protein 
(5 ^ engine in the ER membrane, and the ribosome acts, in most but not all 

= i cases, as a ligand that maintains the translocation competence of the 

^ 5 nascent chain. 

1 CONCEPTS AND CONTROVERSIES 

We have surveyed the development of ideas on the problem of trans- 
location of newly synthesized proteins across the ER membrane. Initially, 
attention was focused on the coupling of translocation to translation, a 
feature unique to translocation across the ER membrane. This has given 
way to the realization that obligate coupling to translation is not a pre- 
requisite for translocation and that transport across membranes of a 
variety of organelles may share common features. These include the 
involvement of a targeting receptor to discriminate among proteins 
intended for different destinations, a translocon that somehow transports 



Xn Annual Reviews 

Xi www.annualreviews.org/aronline 



PROTEIN TRANSLOCATION ACROSS THE ER 513 

the targeted protein across the bilayer, and a requirement for energy 
(derived from hydrolysis of nucleoside triphosphates or from an electro- 
chemical gradient) to drive translocation. The recognition of these steps 
has resulted from the study of diverse proteins in a variety of organisms 
and from the study of "artifacts" generated in vitro, i.e, biochemically or 
genetically altered translocation machinery (Siegel & Walter 1986b) and 
substrates (Perara & Lingappa 1985), whose aberrant behavior has pro- 
vided insight into fundamental details of the targeting and translocation 
problem. Even as new questions emerge, many old ones (e.g. the molecular 
go nature of the signal sequence-receptor interaction) remain unanswered. 

5 Other questions must now be reformulated. For example, in spite of the 

H recent demonstration that the translocon in the ER membranes can, in 



> c 

p o 



principle, accept translocation substrates posttranslationally, transloca- 
I ^ tion most likely occurs cotranslationally in vivo. The observation that 

J52 1 most posttranslational translocation across the ER membrane appears 

I H. to be ribosome dependent in vitro supports this notion. As described 

earlier, ribosome-independent and ribosome-dependent modes of post- 
£ g translational translocation across the ER membrane probably reflect the 

requirements for maintenance of the "translocation competent state'* of 
the nascent chain (see Figure 3). Loss of translocation competence may 
S be due to folding (aberrant or normal) or oligomerization of the protein, 

^ g or entanglement of the signal sequence with the rest of the chain such that 

^ § the resulting structure can no longer functionally interact with either the 

S "2 targeting or translocation machinery. A few proteins (such as yeast prepro- 

5 E a-factor) retain translocation competence even as free, completed poly- 



p 

S O 



O ° 



c 
c 

< 



peptides. For most proteins, however, translocation competence is re- 
stricted to a generally narrow range of chain lengths. This range can be 
a extended if the polypeptide is targeted to the membrane while still attached 

=58 to the ribosome. However, eventually most proteins reach a point in chain 

> ;§ elongation where translocation competence is no longer maintained, even 

«i ^ when the protein is associated with the ribosome. One of the roles of the 

g SRP-induced elongation arrest may therefore be to extend the effective 

range of translocation competence for the nascent polypeptide chains. 

Previously, the nascent chain was thought to be vectorially translocated 
across the membrane as it emerged from the ribosome ; the finding of 
posttranslational translocation raises the possibility that the translocon 
may be sufficiently pliable to accept (partially) folded domains rather than 
exclusively linear polypeptide chains. Alternatively, the translocon may 
effect unfolding of such domains prior to translocation. In either case the 
molecular environment traversed by the protein as it passes through the 
bilayer remains to be investigated. The finding that translocation is driven 
by nucleoside triphosphate hydrolysis is a direct demonstration of a protein 




Annual Reviews 

www.annualreviews.org/aronline 




Polysome 



Translocai 
Substrate 



Translocatior 
Competence 



Chain Elongation 



Figure 3 Ribosome dependence of translocation competence. This figure depicts the natural 
history of the relationship of chain growth {A) to translocation competence (Q. The ribosonae 
dependence of posttranslational translocation was assayed for various lengths of polypeptide 
synthesized. Progressively shorter polypeptides were synthesized by translating mRNA tran- 
scripts in vitro that were progressively truncated at their 3' end and therefore lacked ter- 
mination codons (Perara et al 1986 ; E. Perara & V. R. Lingappa, manuscript in preparation). 
Ribosomes that have reached the 3' end of such a truncated mRNA appear unable to release 
the newly synthesized polypeptide. Release can be artificially achieved by treatment with 
puromycin. Such translocation substrates, either with or without release from the ribosomes 
(as indicated in B\ can be assayed for translocation competence upon presentation to a 
microsomal membrane preparation in the presence of nucleoside triphosphate to supply 
energy. In this assay the ribosome dependence or independence of the translocation com- 
petence is reflected in the ability or inability of puromycin pretreatment to abolish trans- 
location by releasing the chain from the ribosome (see right arms of branched arrows). {A) 
depicts three ribosomes on a polysome al various stages (I, II, and III) during the synthesis 
of a hypothetical secretory polypeptide chain. In (Q translocatin competence as assayed 
posttranslationally (see above) is indicated ( + ). At stage I, the nascent chain is translocation 
competent, and this comi>etence is independent of the presence of the ribosome, as experi- 
mentally demonstrated. As chain growth proceeds, the polypeptide enters stage II where its 
translocation competence requires the ribosome. Finally, late in chain growth (stage III) the 
chain is no longer competent to interact with receptors and other proteins involved in 
translocation. Whether loss of translocation competence in stage III involves a loss of 
targeting function or loss of a productive interaction with the translocon remains to be 
determined. It is not known whether SRP is required for posttranslational translocation in 
either case. 



Annual Reviews 

www.annualreviews.org/aronline 

PROTEIN TRANSLOCATION ACROSS THE ER 515 

engine in the membrane and rules out a spontaneous process previously 
suggested (Wickner 1979; Engelman & Steitz 1980). It remains to be 
established how the energy of hydrolysis is used by the translocon. 

Old controversies regarding co- versus posttranslational translocation 
appear to be resolved. In retrospect it could be concluded that many 
prokaryotic proteins (targeted to the plasma membrane) do not require 
ribosomes to maintain their translocation competence. This also appears 
to be the case for all proteins (so far studied) that are translocated across 
the peroxisomal membrane and the mitochondrial and chloroplast en- 
velopes. The most challenging problems for future research now include 
the further fractionation and purification of all the essential, as well as 
modulatory, components of the targeting and translocation machinery. 
This should ultimately allow their reconstitution in in vitro systems for 
the mechanistic analysis of their functions. Finally, our goal must be the 
understanding of how these components function in vivo. This should 
include elucidation of the regulatory or homeostatic mechanisms involved 
in harnessing such a remarkable set of protein machines as the translocons. 



Acknowledgments 

We wish to thank David Andrews, Patricia Hoben, and Leander LaufTer 
for many helpful comments on the manuscript. This work was supported 
by NIH grants GM-32384 to PW and GM-31 626 to VRL. PW is a recipient 
of support from the Chicago Community Trust/Searle Scholars Program. 



Literature Cited 

Andrews, D, W., Walter, P,, Ottensmeyer, 

F. P. 1985. Proc. NatL Acad. Sci. USA 

82 ' 785—89 
Bergman, L. W., Kuehl, L. M. 1979. J. Biol. 

Chem. 254 : 8869-76 
Blobcl, G. 1980. Proc, Natl, Acad, Sci, USA 

77: 1496-1500 
Blobel, G„ Dobberstein, B. 1975. J. Cell 

Biol. 67: 835-51 
Bos, T. J., Davis, A, R., Nayak. D. P. 1984. 

Proc. Natl. Acad, Sci, USA 81 : 2337-41 
Briggs, M. S., Gierasch, L. M. 1986. Adu, 

Protein Chem. 38 : In press 
Engelman, D. M.. Steitz, T. A, 1981. Cell 

23: 41 1-22 

Erickson, A. H., Walter, P., Blobel, G. 1983. 
Biochem. Biophys. Res. Commm. 115; 
275-80 

Evans, E., Gilmore, R., Blobel, G. 1986. 

Proc, Natl. Acad. Sci. USA 83 : 581-85 
Gilmore, R., Blobel, G., Walter, P. 1982a. /. 

Cell Biol. 95 : 463-69 
Gilmore, R.. Blobel, G. 1983, Cell 35 : 677- 

85 



Gilmore, R., Blobel, G. 1985. Cell 42: 497- 
505 

Gilmore, R., Walter, P., Blobel, G. 1982b. J. 

Cell Biol. 95 : 470-77 
Glabe, C. G.. Hanover, J. A., Lennarz, W. 

J. 1980. J, Biol, Chem, 255 : 9236-^1 
Gundelfinger, E. D., Carlo, M. D., Zopf, D., 

Melli, M. 1984. EM BO J. 3 : 2325-32 
Gundelfinger, E. D., Krause, E., Melli, M., 

Dobberstein, B. 1983. Nucleic Acids Res. 

1 1 : 7363-73 
Hansen, W. B,, Garcia, P. D., Walter, P. 

1986. CW/ 45: 397-406 
Hortsch. M., Griffiths, G.. Meyer, D. L 1985. 

Eur. J. Cell Biol. 38 : 271-79 
Katz, F. N., Rothman, J. E., Lingappa. V. 

R., Blobel, G., Lodish. H. F. 1977. Proc. 

Natl. Acad Sci, USA 74: 3278-82 
Kreibich, G., Freienstein, C. M., Percyra, B. 

N., Ulrich, B. L., Sabatini, D. D. 1978. J. 

Cell Biol. 77:488-506 
Krieg, U.. Walter. P.. Johnson, A. 1986. 

Proc. Natl. Acad. Sci, USA. In press 
Kurzchalia, T. V., Wiedmann, M., Gir- 



CO 



^ 5 
c 

Vi O 

•|£ 

1- 
On «o 

O o 

« c 
^ O 
C O 

o t> 

Qq 

so C 

C^^ ' 
Ov to 

^- e 

°° ';rt 

C 
C 

< 



Annual Reviews 

www . an n ua Ire V i ews . o rg/aro n I i n e 

516 WALTER & LINGAPPA 

shovicK, A. S., Bochkareva, E. S., Bielka, 

H.. Rapoporl, T. A, 1986. Nature 320: 

634-36 . . „ v^, 

Lauffer, L., Garcia, P. D., Harkms, R- N.. 

Coussens, L.. Ullrich, A., Walter, P. 1985. 

A^fl^i/rc 318: 334-38 
Lee, C. Beckwith, J. 1986. Ann. Rev. Cell 

Bio/- 2: 315-36 
Lingappa. V. R., Chaider. J.. Yost, C. S., 

Hedgpeth. J. 1984. Proc, Nad, Acad. Set 

1/5^4 81:456-60 
Meyer, D, I. 1985. BMBO y. 4: 2031-33 
Meyer, D. L, Dobberstein, B. 1980a. 7. Cell 

Biol, 87 : 498-502. , ^ 

Meyer, D. I., Dobberstein, B. 1980b. J. Cell 

Bio/. 87 : 503-8 . . ^ 

Meyer, D. I., Krausc. E., Dobberstein, B. 

1982a. A^fl/ure 297: 647-50 . 
Meyer, D. I., Louvard, D., Dobberstein, B. 

1 982b. /. Cell Biol. 92 : 579-83 
Mueckler, M.. Lodish, H. F. 1986. Cell 44 : 

629-37 ^ 
Mullcr. M., Ibrahimi, I., Chang, C. N., 

Walter, P.. Blobel. G. 1982. J. Biol. Chem. 

257:11860-63 ^ „ 

Palade, G. 1975. Science 189 : 347-58 
Palmitcr, R. D., Gagnon, J., Walsh, K. A. 
1978. Proc. Nail. Acad. Set. USA 75 : 94- 

98 

Perara, E., Lingappa, V. R. 1985. 7. Cell 

Biol. 101 : 2292-2301 
Perara, E., Rolhman, R. E., Lingappa, V. R. 

1986. Science 232 : 348-52 
Prchn, S., Numbers, P., Rapaport, T. A. 

1980. Eur. J. Biochem. 107 : 185-95 
Redman, C. M., Sabatini, D. D. 1966. Proc, 

Natl. Acad. ScL USA 56 : 608- 1 5 
Rothblatt. M., Meyer, D. I. 1986. Cell 44: 

Siegel. v., Walter. P. 1985. J. Cell Biol. 100: 



1913-21 

Siegel. v., Walter, P. 1986a. Nature 320 : 81- 
84 

Siegel, v., Walter. P. 1986b. In Genetic En- 

gineering^ Vol. 8, pp. 179-94. ed. J. K. 

Setlow. New York : Plenum 
Tajima, S., Lauifer, L., Rath. V., Walter, P. 

1986. y. Cell Biol. 103 : In press 
UHu, E.. Tschudi. C. 1985. Nature 312: 171- 

72 

von Heijne. G. 1985. J. Mol Biol. 184: 99- 
105 

Walter, P., Blobel, G. 1980. Proc, Natl Acad. 

Sci. USA 77: 7112-16 
Waller, P„ Blobel, G. 1981a. J. Cell Biol. 

91 : 551-56 
Walter, P., Blobel, G. 1981b, J. Cell Biol. 

91 : 557-^1 

Walter, P.. Blobel, G. 1982. Nature 299: 
691-9S 

Walter, P., Blobel. G. 1983. Cell 34 : 525-33 
Walter, P., Gilmore, R., Blobel. G. 1984. 

Cell 38 : 5-8 , 
Walter, P., Ibrahimi, 1.. Blobel, G. 1981. J. 

Cell Biol. 9\ : 545-50 
Walter, P., Jackson, R. C, Marcus, M. M., 

Lingappa, V. R., Blobel, G. 1979. Proc 

Natl. Acad. Sci, USA 76 : 1 796-99 
Warren, G., Dobberstein. B. 1978. Nature 

273:569-71 
Waters, G.. Blobel, G. 1986. /. Cell BioL 

102: 1543-50 
Wickner, W. T. 1979. Ann. Rev. Biochem. 

48 : 23-45 

Wickner. W. T.. Lodish, H. F. 1985. Science 
230 : 400-7 ^ , 

Zwieb,C. 1985. Nucleic Acids Res. 13 : 6105- 

24 



i 



Annual Review of Cell Biology 
Volume 2, 1986 



CONTENTS 



Activation of Sea Urchin Gametes, James 5. Trimmer 
g) and Victor D. Vacquier 

% . Cell-Matrix Interactions and Cell Adhesion During 



> c Development, Peter Ekblom, Dietmar Vestweber, 



1 « 
if 

«l c 

t/i o 



and Rolf Kemler 27 

Spatial Programming of Gene Expression in Early 
g g Drosophila Embryogenesis, Matthew P. Scott and 

Patrick H, O Tar rell 49 



o o 



cx 
o 

o 



g Cell Adhesion Molecules in the Regulation of Animal Form 

c§ g AND Tissue Pattern, Gerald M. Edelman 8 1 

Core Particle, Fiber, and Transcriptionally Active 
Chromatin Structure, D, S, Pederson, F, Thoma, 
and T, Simpson 1 1 7 

The Role of Protein Kinase C in Transmembrane Signalling, 
' Ushio Kikkawa and Yasutomi Nishizuka 149 

2 J Proton-Translocating ATPases, Qais Al-A wqati 1 79 

^5 Region-Specific Cell Activities in Amphibian Gastrulation, 

-z o John Gerhart and Ray Keller 20 1 

= E T-Cell Activation, H, Robson MacDonald and Markus Nabholz 23 1 



o O 
•O — 

-2 o 

? (50 



> 



^ 5 Anchoring and Biosynthesis of Stalked Brush Border 

& ^ Membrane Proteins : Glycosidases and Peptidases of 

g Enterocytes and Renal Tubuli, Giorgio Semenza 255 

^ Cotranslational and Posttranslational Protein Trans- 

location IN Prokaryotic Systems, Catherine Lee and 
Jon Beck with 3 1 5 

The Directed Migration of Eukaryotic Cells, 5. Singer 

and Abraham Kupfer 337 

Protein Import into the Cell Nucleus, Colin Dingwall and 

Ronald A, Laskey 367 

G Proteins : A Family of Signal Transducers, Lubert Stryer 

and Henry R. Bourne 391 

vii 



4 



0 



CO 



> . 

> c 



viii CONTENTS {continued) 

Microtubule- Associated Proteins, /. B, Olmsted 421 

Structure and Function qf Nuclear and Cytoplasmic 

RiBONUCLEOPROTEiN PARTICLES, Gideon Dreyfuss 459 

Mechanism of Protein Translocation Across the Endoplasmic 
Reticulum Membrane, Peter Walter end Vishwanath 
R, Lingappa 499 

Regulation of the Synthesis and Assembly of Ciliary and 
Flagellar Proteins During Regeneration, 

Paul A. Lefebvre and Joel L. Rosenbaum 517 
Indexes 

Subject Index 547 
S Cumulative Index of Contributing Authors, Volumes 1-2 557 

g-l Cumulative Index of Chapter Titles, Volumes 1-2 558 

^ c 

CA O 

75 

i 

1- 
ti: ^ 

o o 
-o — 

> too 
O 

:£ § 
<^ 

OS S 
so ^ 

OS w 

— CJ 
• Q« 
o o 

S.^ 

o " 

c 

c 

< 



■ 




Exhibit 4 1 



The Jourkal of Biological Chemistry 

O 2000 by The American Society for Biochemistry a ad Molecular Biology, Inc. 



Vol. 275, No. 1, Issue of January 7. pp. 378-385, 2000 

Printed in U.SA. 



Mutational Analysis of the Primary Substrate Specificity Pocket of 
Complement Factor B 

^gp226 js ^ ]v[AJOR STRUCTURAL DETERMINANT FOR Pi-ARG BINDING* 

(Received for publication, July 30, 1999, and in revised form, October 12, 1999) 

Yuan yuan Xu1:§, Antonella Circolot, Hua JingHII, Yue Wangi, Sthanam V. L. Narayanan, 
and John E. Volanakis^** 

From the XDivisiort of Clinical Immunology and Rheumatology^ Department of Medicine and ^Center for Macromolecular 
Crystallography, University of Alabama at Birmingham, Birmingham, Alabama 35294, the ^Center for Blood Research, 
Harvard Medical School, Boston, Massachusetts 02138, and the **Biomedical Sciences Research Center "A. Fleming," 
Vari 166 72, Greece 



Factor 6 is a serine protease, which despite its tryp- 
sin-like specificity has Asn instead of the typical Asp at 
the bottom of the pocket (position 189, chymo- 
trypsinogen numbering). Asp residues are present at 
positions 187 and 226 and either one could conceivably 
provide the negative charge for binding the Pj-Arg of 
the substrate. Determination of the crystal structure of 
the factor B serine protease domain has revealed that 
the side chain of Asp^^® is within the pocket, whereas 
Asp*^^ is located outside the pocket. To investigate the 
possible role of these atypical structural features in sub- 
strate binding and catalysis, we constructed a panel of 
mutants of these residues. Replacement of Asp^^*^ caused 
moderate (50-60%) decrease in hemolytic activity, com- 
pared with wild type factor B, whereas replacement of 
Asn*®^ resulted in more profound reductions (71-95%). 
Substitutions at these two positions did not signifi- 
cantly affect assembly of the alternative pathway C3 
convertase. In contrast, elimination of the negative 
charge from Asp^^® completely abrogated hemolytic ac- 
tivity and also affected formation of the C3 convertase* 
Kinetic analyses of the hydrolysis of a P|-Arg containing 
thioester by selected mutants contirmed that residue 
^p226 jg ^ primary structural determinant for P^-Arg 
binding and catalysis. 



Complement is a major effector system of host defense. Ac- 
tivation of complement leads to the generation of protein frag- 
ments and protein-protein complexes that mediate acute in- 
flammatory responses, phagocytosis and killing of pathogens, 
and regulation of adaptive immune responses. Activation-asso- 
ciated production of biologically active protein fragments is 
catalyzed by a group of eight atypical complement serine pro- 
teases (SPs)^ of the chymotrypsin superfamily (1). Understand- 



* This work was supported by National Institutes of Health Grants 
AI21067 (to J. E. V,), NIAMS, National Institutes of Health Grant P60 
AR20614 R-3 (to Y. X.), and National Institutes of Health Grant 
AI39818 (to S. L. V. N.). The costs of publication of this article were 
defrayed in part by the payment of page charges. This article must 
therefore be hereby marked '^advertisement'" in accordance with 18 
U.S.C. Section 1734 solely to indicate this fact. 

§ To whom correspondence should be addressed: THT, Rm. 437, 1900 
University Blvd., Div. of Clinical Immunology and Rheumatology, Dept. 
of Medicine, Birmingham, AL 35294, Tel.: 205-975-6241; Fax: 205-934- 
2126; E-mail: rheu019@uabdpo.dpo.uab.edu. 

* The abbreviations used are: SP, serine protease; B-SP, the factor B 
serine protease domain; cCOLL, fiddler crab collagenase; CCP, comple- 
ment control protein module; CHO, Chinese hamster ovary; CoVF, 
cobra venom factor; EC3b, erythrocytes sensitized with C3b; hnELA, 
human neutrophil elastase; hPR03, human protease 3; mAb. mono- 



ing the structural basis for the highly restricted proteolytic 
activity of these SPs is an important first step toward pharma- 
cologic control of complement activation (2). 

Members of the chymotrypsin family have very similar 
three-dimensional structures but distinct substrate specifici- 
ties. To a gpreat extent specificity is determined by the side 
chains of the amino acid residues that line up the primary 
substrate specificity pocket (S^ site). The pocket has three walls 
formed by residues 189-195, 214-220, and 225-228 (chymot- 
rypsinogen numbering has been used for all SPs or SP domains 
throughout this paper) (3), The presence at the bottom of the 
pocket of Asp^*^ endows trypsin with preference for positively 
charged Arg and Lys residues (4, 5), whereas in chymotrypsin 
the specificity for bulky aromatics is largely determined by 
(ggj.189 Residues at position 216 and 226 also contribute to 
substrate specificity (7). All complement SPs exhibit trypsin- 
like specificity for positively charged Arg residues and all have 
an Asp at position 189, except for factor B and C2 (Fig. 1). 

Factor B and C2 are structurally similar modular proteins 
that play a central role in complement activation by providing 
the catalytic subunits of two key enzymes, namely the C3/C5 
convertases of the alternative and the classical pathway, re- 
spectively. Complement convertases cleave the same single 
peptide bonds in C3 and C5. In addition to having Asn and Ser, 
respectively, instead of Asp at position 189, factor B and C2 
also lack the highly conserved free N-terminal sequence of SPs. 
In typical SPs, the N-terminal sequence constitutes an essen- 
tial structural element largely responsible for the transition 
from zymogen to active enzyme (8). Full expression of the 
proteolytic activities of factor B and C2 only occurs in the 
context of the complexes, C3bBb(C3b) and C4b2a(C3b), respec- 
tively (9). The SP domain resides in the C-terminal half of Bb 
or C2a and is preceded by a von Willebrand factor type A 
module (VWFA) which is noncovalently associated with C3b or 
C4b, respectively, in a Mg^"*" -dependent manner. These atypi- 
cal structural features of factor B and C2 indicate a novel 
activation mechanism and probably also a distinct substrate 
binding arrangement at the primary specificity pocket. 

In addition to their natural protein substrates C3 and C5, 
factor B and C2 and their fragments Bb and C2a hydrolyze a 
small number of C3- and C5-hke synthetic substrates (11-14). 
Overall, C3-like substrates are considerably more reactive than 
C5-like substrates. However, even toward their best sub- 
strates, the k^g^^/K„^ values of factor B, Bb, C2, and C2a are 



clonal antibody; SBzl, thiobenzyl; VWFA, von Willebrand factor type A 
module; wt, wild type; Z, benzyloxycarbonyl; PAGE, polyacryl amide gel 
electrophoresis. 



378 



This paper is available on line at http://www.jbc.org 



Primary Specificity Pocket of Factor B 



379 



Fig. 1. Alignment of partial amino 
acid sequences of factor B, C2, chy- 
motrypsin, and trypsin. Residues that 
form the walls of the primary specificity 
pocket are shaded. The catalytic triad res- 
idue Ser*®*^ is boxed and marked by an 
asterisk. Arrows indicate residues tar- 
geted for site-directed mutagenesis. Num- 
bers at the top are for residues of the 
chymotrypsinogen sequence and those at 
the bottom are for the factor B sequence. 
CUT, bovine chy mo trypsin; TRP, bovine 
trypsin; HC2, human C2; HFB^ human 
factor B, 



164. 
170- 
647- 
659- 



G A • - 
O Y L - - E O O K 
G T Q - • • E D E 
GGVSPYADP 
I 

660 



100 

▼ ▼ ▼ I 

• S G V ISISICIMIGIDI 



200 
I 



210 
I 



© © @ ©® 



I 



lOOPLVCKKNGAWTLVQ I V 
OGPVVCS - - - - QKtQOIV 
OGAVFLERRFRFFQVGLV 
IgOPL I VHKRSRF IQVGVI 
I I 



IS. 



220 
) 



T 
Q A 
L G 
K N 



CHT 
TRP 
KC2 
KFB 



670 



680 



690 



I 

700 



230 240 

▼ I ! 

223- S TIMGiVAYJ ARVTALVNWVQQTLAAW 

236- KN KUgolvIlj TKVCNYVSWI KQT I ASN 

686- SADKNSRKRAPRSKVPP PlRfpfatl I NLFRMQPWLRQHLODV 

703- QKRQKQ VPAH A lRlDliatl I NLFQVLPWLKEKLQDE 

I I I 

710 720 730 



N 
O 



F L P 
L Q F 



CHT 
TRP 
HC2 
HFB 



about 3 orders of magnitude lower than the 7.8 x 10^ 
value measured under the same conditions for the hydrolysis of 
the most reactive thioester by trypsin (14). By comparison, the 
catalytic efficiency ik^^^^K^^) of C3bBb for C3 cleavage was 
reported to be 3.1 X 10^ s~^ (10). No natural serine prote- 
ase inhibitor has been found for factor B or C2 and regulation 
of the proteolytic activity of C3 convertases is effected largely 
through control of the assembly and decay of the bimolecular 
complexes. The structural correlates of the low esterolytic ac- 
tivity and extremely restricted substrate specificity as well as 
the conformational change(s) associated with zymogen activa- 
tion are not understood. Determination of the structure of the 
factor B serine protease domain (B-SP) at 2.1-A resolution has 
revealed the expected chymotrypsin fold but also unique fea- 
tures of surface loops and of the oxyanion hole.^ The backbone 
conformation of the pocket is similar to that of trjrpsin, but 
there are substitutions of functionally important residues. In 
this study we used site-directed mutagenesis to analyze possi- 
ble effects of the factor B-specific residues on the assembly and 
activity of the C3 convertase. The data indicate that Asp^^^ is 
a primary structural determinant of P^-Arg binding and that 
the native conformation of Asp^^^ and Asn^®^ are important 
determinants for C3 cleavage. 

EXPERIMENTAL PROCEDURES 

Construction of Mutant Factor B cDNA — The factor B cDNA clone 
BHL4-1 (15) in the expression vectors pRc/CMV or pcDNA3 (Invitrogen, 
Carlsbad» CA) was used as wild type (wt) template in site-directed 
mutagenesis. Factor B mutant cDNA constructs were obtained by the 
method of Zollar and Smith (16) as modified by Kunkel(17). Alternatively, 
the QuikChange Site -directed mutagenesis kit (Stratagene, La JoUa, CA) 
was used according to the manufacturer's protocol. All cDNA constructs of 
mutant factor B were verified by restriction mapping and dideoxynucle- 
otide sequencing (18) of the region around the mutation. Oligonucleotides 
were synthesized by the phosphoramidite method (19), using a DNA/RNA 
synthesizer (Model 394 Applied Biosystems, Foster City, CA). 

Expression ofwt and Mutant Factor B cDNA — Transient transfection 
of COS cells with 30-40 jug of cDNA was performed by electroporation 
as described (20). Cell culture supernatant containing secreted factor B 
proteins was harvested 72—90 h after transfection. Cell debris was 
removed by centrifugation and the supernatant was stored fi-ozen at 
— 80 ''C in small aliquots. The concentration of recombinant factor B in 
the medium was measured by enzyme-linked immunosorbent assay 
(15), using a rabbit anti -human Bb IgG (50 Mg/nil) as capturing anti- 
body and the mouse anti-Ba monoclonal antibody (mAb) HA4-ID5 (1.5 
/ig/ml) as reporter. The assay was developed with 1:1000 dilution of 
affinity-purified goat anti-mouse IgGl alkaline phosphatase conjugate 
(Southern Biotechnology Associates, Birmingham, AL) and Sigma sub- 
strate 104 (Sigma). Color development was measured at 405 nm. The 
concentration of factor B was calculated from a standard curve con- 
structed using human serum of known factor B concentration. The 
sensitivity of the assay was approximately 1-2 ng/ml and the concen- 
tration of specific protein in the culture medium ranged from 0.3 to 2 
/jLg/ml. 



Jing, H., Xu, Y., Carson, M., Moore, D., Macon, K. J., Delucas, L. J., 
Volanakis, J. E., and Narayana, S. V. L. (2000) EMBO J. 20, in press. 



To obtain large amounts of recombinant proteins, stable transfection 
of Chinese hamster ovary cells (CHO-Kl, ATCC) was carried out with 
selected mutants by a modification of a previously described method 
(21). CHO-Kl cells were maintained in Ham's F-12 (Cellgro, Hemdon, 
VA) supplemented with 10% heat-inactivated fetal bovine serum (Life 
Technologies, Grand Island, NY), and 2 mM glutamine at 37 *C in a 
humidified, 5% CO2 incubator. Forty micrograms of each CsCl-purified 
plasmid DNA was transfected into 4-6 x 10® CHO-Kl cells by electro- 
poration as described (21). Selection of neomycin -resistant cells was 
started 72 h after transfection with 750 jug of G418 (Cellgro) per ml of 
the above medium. Subcloning of the G418-resistant cells was per- 
formed approximately 7 days after initiating selection by limiting dilu- 
tion of cells at 0.8 cell/well in 96 -well tissue culture plates. Clones were 
allowed to grow in G418-containing medium with 15% heat-inactivated 
fetal bovine serum for 10-12 days before screening for factor B produc- 
tion by enzyme-linked immunosorbent assay. The highest producing wt 
and mutant factor B clones were selected, expanded, and adapted to 
large-scale production by growing in suspension culture for 2 weeks. 
Protein purification was facilitated by culturing cells in ExCell 301 
serum-free medium (JRH Bioscience, Lenexa, KS) supplemented with 
0.5—2% fetal bovine serum, 2 mM glutamine, and 200 /xg/ml G418. 

Purification of Recombinant wt and Mutant Factor B — One to two 
liters of the stably transfected CHO cell culture medium were har- 
vested, concentrated to approximately 150 ml, and applied to a 30-ml 
column of CM Sephadex C-50 equilibrated with 0.1 m sodium acetate, 
20 mM €-amino-ft-caproic acid, 20 mM EDTA, pH 6.5. Factor B was 
eluted with a gradient of 0—0.2 M NaCl in the starting buffer. For 
further purification, factor B-containing pools were dialyzed against 20 
mM Tris-HCl, pH 8.0, and subjected to fast protein liquid chromatogra- 
phy, using a Mono-Q column (Amersham Pharmacia Biotech). Factor B 
was eluted with a gradient of 0-0.3 m NaCl in the starting buffer. For 
some mutants Mono-Q chromatography was repeated. Purity of factor B 
proteins assessed by 10% SDS-PAGE was between 80 and 95%. 

Reactivity of Factor B Mutants with Module-specific MAbs — Two 
anti-Ba mAbs, HA4-1D5 (a subclone of HA4-1A) and FD3-20, and an 
anti-Bb mAb, HA4— 15, were described previously (22). The mAb 6B3.3 
was raised by using as antigen recombinant factor B VWFA module 
expressed in Escherichia coli. Reactivity of factor B mutants with these 
mAbs was examined by enzyme-linked immunosorbent assay similar to 
that described above. The same rabbit anti-human Bb IgG antibody was 
used in the solid phase, and each of the four mAbs was used as de- 
tectant at a concentration of 1.5 ^xg/ml. The assay was developed with 
goat anti -mouse IgG + IgM alkaline phosphatase conjugate (Jackson 
Immunoresearch Laboratory, Inc., West Grove, PA) and phosphatase 
substrate Sigma 104. Values obtained for each mAb were normalized to 
those measured for HA4-1D5 and represent the average of two sepa- 
rate experiments. 

Solid-phase Cobra Venom Factor (CoVF) Binding Assay — Binding of 
wt and mutant factor B to CoVF was determined by enzyme-linked 
immunosorbent assay as described (23). Culture medium from trans- 
fected COS cells containing wt or mutant factor B was dialyzed against 
half-strength veronal-bufiered saline (0.5 x veronal -buffered saline, 2.5 
mM sodium 5, 5 -diethyl barbiturate, pH 7.4) containing 5 mM MgClg at 
4 **C overnight. Serial dilutions of factor B in the same buffer were then 
added to microplates coated with CoVF (Quidel, San Diego, CA). Bind- 
ing of factor B to CoVF was allowed to occur in the absence or presence 
of 1.5 p,^fm\ factor D at 37 "C for 2 h. Bound factor B or Bb were detected 
with rabbit anti-Bb IgG (50 ^^g/ml) and goat anti-rabbit IgG alkaline 
phosphatase conjugate. Results represent the average values of two 
separate experiments. 

CoVF-mediated Factor B Cleavage by Factor /)— COS cells (4-6 X 



380 



Primary Specificity Pocket of Factor B 



10^) were transiently transfected by electroporation with wt or mutant 
factor B cDNA as described above. The cells were metabolically labeled 
72 h later in 1 ml of Dulbecco's modified Eagle's medium without 
methionine, supplemented with 250 /xCi of [^^S]Met (specific activity — 
1000 Ci/mmol, Amersham Pharmacia Biotech or ICN Radiochemical, 
Irvine, CA.) for 30 min and chased with cold methionine in Dulbecco's 
modified Eagle*s medium supplemented with 10% heat-inactivated fe- 
tal bovine serum. After a 3-h chase, 650-^1 aliquots of the culture 
supematants were collected, supplemented with 25 mM Tris-HCl, pH 
7.4, 2.5 mM MgClg, and incubated for 2 h at 37 *C with factor D (300 and 

2 ng) in the absence or presence of 5 /xg of CoVF. Labeled factor B and 
Bb were immunopreci pita ted by using rabbit anti-Bb IgG antibody and 
Staphylococcus aureus protein A and analyzed by SDS-PAGE as de- 
scribed (24). To assess factor B cleavage, gel slices corresponding to the 
autoradiographed bands and blank spaces were cut and digested with 
15% H2O2 at 56 *C overnight. The blank gel cuts were used to subtract 
background radioactivity. The released radioactivity was measured 
with Bio Safe II scintillation fluid (RPI, Mount Prospect, ID in an LKB 
liquid scintillation counter (Model 1215 LKB, Gaithersburg, MD) (25). 

Factor B Hemolytic Assay — Sheep blood erythrocytes carrying C3b 
(EC3b) were prepared as described (22), by using freshly purified hu- 
man factor B (22), factor D (26), and C3 (27). Serial dilutions of culture 
medium containing wt or mutant factor B were added to 7.5 x 10^ 
EC3b, 12.5 ng of factor D, and 125 ng of properdin (Sigma) in a total 
volume of 150 ^il in 0.5 X veronal-buffered saline containing 2.5% 
dextrose, 2.5 mM MgCl^, 10 mM EGTA, and 0.1% gelatin. Formation of 
C3 convertase, C3bBb(P), was earned out at 30 "C for 30 min. Then, 0.5 
ml of guinea pig serum diluted 1:40 with 10 mM EDTA in veronal- 
buffered saline was added as source of C3 to C9 and the reaction 
mixture was incubated for 1 h at 37 "C. Percent lysis and hemolytic 
units/^g were calculated as described (28). Values of specific hemolytic 
activity of each mutant were normalized to that of wt factor B and 
represent the mean ± S.E. of at least three independent determina- 
tions, each performed in duplicate. 

C3 Cleavage Assay — 03 was freshly isolated fi:*om plasma of a normal 
individual as described (27) except that a final chromatographic step 
using hydroxyapatite fast protein liquid chromatography (Amersham 
Pharmacia Biotech) was added. Purified wt or mutant factor B (50 ng) 
was mixed with C3 (75 ng) with or without 150 ng of CoVF and 12.5 ng 
of factor D in a total volume of 25 fil of 25 mM Tris-HCl, pH 7.4, 
containing 75 mM NaCl and 5 dim MgCl^. After incubating at 37 'C for 
1 h, 10 ^1 of each reaction mixture was analyzed on 7.5% SDS-PAGE. 

03 and 03 fragments were detected on Western blots by using goat 
anti-human 03 IgG (Oappel, Durham, NO) and affinity-purified rabbit 
anti-goat IgG F(ab)'2 horseradish peroxidase conjugate (ION). The EOL 
luminescent detection system (Amersham Pharmacia Biotech) was uti- 
lized to visualize 03 polypeptide chains following the manufacturer's 
protocol. The amount of 03 conversion was determined by scanning a 
and a' chain using ScanMaker 5 scanner (MicroTck Lab, Inc., Redondo 
Beach, OA) and band intensity was quantified using soft.ware 
NIHimagel.58. 

Esterolytic Assays — The rate of hydrolysis of Z-Lys-Arg-SBzl (Penin- 
sula Laboratories Belmont, OA) was measured by a modification of the 
method of Kam et al. (14). Assays were carried out in microplate wells. 
The B-SP was expressed by Sf9 insect cells infected by recombinant 
baculovirus and isolated from the se3rum-free Excell 401 media using 
Bio-Rex 70 and Mono S ion exchange chromatography.^ The recombi- 
nant B-SP consists of a vector-derived tripeptide Ala-Asp-Pro at the N 
terminus and the 0-terminal 295 amino acid residues of factor B. 
Purified factor B or B-SP (0.11-0.2 /xm) was added to 0.08 to 0.8 mM 
Z-Lys-Arg-SBzl and 1.6 mM Ellman's reagent 5,5-dithiobis-(2-nitroben- 
zonic acid) (Sigma) in 250 ju,l of 0.1 M HEPES, pH 7.5, containing 0.5 M 
NaOl and 16% MoaSO. Factor B was omitted from control wells used for 
measuring background hydrolysis of the substrate. Esterolytic rates 
were measured kinetically for 15 min by using a V^^^ kinetic microplate 
reader (Molecular Devices, Menlo Park, OA). Kinetic constants were 
determined by the Lineweaver-Burk method based on at least five 
substrate concentrations. Correlation coefficients in all cases were 
greater than 0.98. 

RESULTS 

To understand the structural implications of the unique fac- 
tor B residues in and around the primary specificity pocket, the 
serine protease domain (B-SP) was expressed using a baculo- 
virus system and its crystal structure determined at 2.1-A 
resolution by multiple isomorphous and molecular replacement 
methods.''^ As expected, B-SP v/as found to display a chymo- 



trypsin-like, two ^-bar^el structural fold. In the active center, 
the catal3l;ic triad residues, Asp^^^, His^"^, and Ser^®^, and the 
nonspecific substrate-binding site (Ser-Trp-Gly^^^~^^®) have 
t3T)ical serine protease configurations (Fig. 2). However, the 
oxyanion hole displays a zynmogen-like conformation due to the 
inward orientation of the carbonyl oxygen atom of Arg^®^, the 
backbone of which together with those of Cys^®^, Gly^®^, and 
Asp^^"* form a single-turn 3^0 helix. The three walls of the 
primary specificity pocket are formed by residues 189—195, 
214-220, and 225-228. The backbones of these residues, except 
for the single-turn helix, can be superposed on those of the 
corresponding residues of trypsin. Asn^**^ is located at the bot- 
tom of the pocket, replacing the highly conserved Asp of other 
SPs with trypsin-like substrate specificity. However, the side 
chain of Asp^^*^, which replaces Gly^^^ of trypsin, extends to- 
ward the bottom of the pocket which suggests that it may be 
directly involved in binding the Pi-Arg of the substrate substi- 
tuting for Asp^^® of other trypsin-like SPs. An Asp residue also 
replaces a conserved Gly of other SPs at position 187. Asp^®^ of 
factor B is located directly beneath the pocket and forms a salt 
bridge with Lys^®^. To investigate the possible participation of 
the three residues, Asp^^*^, Asn^®®, and Asp^^®, in substrate 
binding and catalysis, factor B mutants at these positions were 
constructed and assayed. In addition, the functional role of 
Pro^^^, not found at this position in other SPs, was also as- 
sessed. In most cases, two independent clones for each mutant 
were expressed and analyzed to avoid artifactual results. In all 
cases, results of functional analysis of the two clones of each 
mutant were consistent. This suggested that functional differ- 
ences from the wt resulted from the amino acid substitution at 
the mutation sites. 

Reactivity of Factor B Mutants with Module- specific 
MAhs — To probe for possible effects of the mutations on the 
overall structure of the molecule, we tested the reactivity of the 
mutants with a panel of module-specific mAbs. The anti-Bb 
mAb HA4— 15 (22) has been shown to recognize an epitope on 
the SP domain (data not shown). MAbs FD3-20 (anti-CCPl-3) 
and HA4-1D5 (anti-CCP2) bind to distinct epitopes on the Ba 
fragment (29), while 6B3.3 (yl,K) recognizes an epitope on the 
VWFA module at or near the C3b-binding site (data not 
shown). We did not observe substantial differences in the re- 
activity of the mutants with the four mAbs (data not shown), 
suggesting that all epitopes tested are retained in their native 
conformation. 

Formation of the CoVFB and CoVFBb Complexes — Expres- 
sion of proteolytic activity by the factor B SP domain requires 
binding of factor B to C3b and its proteolytic cleavage by factor 
D. Introducing mutations in the SP domain could alter C3b 
binding and/or susceptibility to factor D cleavage, although 
these functions have been assigned to distal parts of the mol- 
ecule, namely, the CCP and the VWFA modules (1). We exam- 
ined the ability of factor B mutants to form the CoVFB and 
CoVFBb complexes. Choice of CoVF over C3b was dictated by 
the much longer half-life of the complexes, which facilitates 
detection. All mutants showed dose-dependent binding to CoVF 
in the absence (data not shown) and presence (Fig. 3) of factor 
D. Enhancement of binding to CoVF was observed in the pres- 
ence of factor D for all mutants. Factor B carrying single 
mutations at positions 187 or 189 had essentially the same 
binding activity as wt factor B, except for the D187Y mutant, 
which only formed about half as much CoVFBb as wt factor B. 
In the D226 panel of mutants, surprisingly only D226N had wt 
binding activity. The same substitution combined with N189D 
resulted in 50% reduction of binding to CoVF compared with 
either the D226N or N189D mutant. The trypsin-like mutation 
D226G alone or in combination with the N189D mutation 



Primary Specificity Pocket of Factor B 



381 




D102 ej^f H57 






W21 



e^-^ S 

S214 

^ C220 



D102 P^f H57 



R225 




C220 




T190 



R225 J'X £ 




Fig. 2, Stereoview of the active center of the factor B serine protease domain. The side chains of the catalytic triad residues and of 
selected residues lining the pocket are shown. Hydrogen bonds between the carboxyls of Asp-*"*** and the side chains of Asn^*'^ and Thr^®" are 
shown by dashed lines. 



caused 60 and 87% reduction, respectively, in CoVFBb complex 
formation. Similar reductions in CoVF binding ability of the 
mutants was also observed without factor D cleavage (data not 
shown). The results suggested that, with the exception of the 
D226N mutation, substitutions at position 226 affect initial 
binding of factor B to CoVF thus sensitivity to factor D prote- 
olysis, since binding is a prerequisite for factor B cleavage. In a 
more direct factor B cleavage assay, conversion of biosyntheti- 
cally labeled factor B to Bb by factor D in the presence of CoVF 
was analyzed by SDS-PAGE and autoradiography (Fig, 4). The 
results correlated well with the binding data. Mutant D226N 
was as sensitive to factor D cleavage as wt factor B. Mutants 
D226N/N189D, D226G, and D226G/N189D were less suscepti- 
ble to factor D with conversion to Bb estimated at 53, 27, and 
16%, respectively, of that of wt factor B at the high concentra- 
tion of factor D. The combined results suggest that although 
the overall structural integrity of the mutants was preserved, 
as indicated by equivalent reactivity with the module-specific 
mAbs, amino acid substitutions in the SP domain apparently 
affected CoVF/C3b binding, which is mediated by sites on the 
other two domains of the molecule. 

Hemolytic Activity of Factor B Mutants — The effects of the 
mutations on the ability of factor B to cleave/activate C3 and 
C5 were assessed by a hemolytic assay. The hemolytic activity 
of the mutants relative to that of wt factor B is illustrated in 
Fig. 5. Elimination of the negative charge of Asp^®'' in mutants 
D187A, D187N, and D187S resulted in 50-60% loss of hemo- 
lytic activity. Substitution of Tyr at the same position caused a 
more pronounced decrease in hemolytic activity, approximately 
80%. The data suggest that the bulky hydrophobic side chain of 



T3rr is not favored and that full expression of factor B hemolytic 
activity requires the salt-bridging conformation of Asp^®^. Ala 
mutation at position 188 in the mutant P188A did not have 
significant effect on the hemolytic activity. 

As revealed in the crystal structure, Asn^®^ and the side 
chain of Asp^^® are located at the bottom of the primary spec- 
ificity pocket and appear to be accessible to the Pj-Arg of the 
substrate (Fig. 2). Replacement of Asn^®^ with charged resi- 
dues, either Asp or Lys, reduced hemolytic activity by 95%, 
while the Ala mutant retained approximately 30% of wt activ- 
ity. Although eliminating the negative charge from Asp^^^ in 
the D226N mutant did not affect the assembly of the CoVFBb 
complex (Fig. 3), it completely abrogated the C3/C5 convertase 
activity. Replacement of the same residue with Gly present in 
trypsin also resulted in complete loss of hemolytic activity. 
Again the loss of hemolytic activity was out of proportion to the 
only moderately reduced ability to form the CoVFBb complex 
(Fig. 3). Attempts to construct a trj^jsin-like pocket by re- 
assigning the negative charge to position 189 in the double 
mutants D226N/N189D and D226G/N189D failed to restore 
factor B hemolytic activity, despite the residual CoVF binding 
activity (Figs. 3 and 6). The hemolytic data strongly indicate 
that Asp^'^® plays a critical and highly specialized role in the 
expression of C3/C5 convertase activity by factor B. Residue 
Asn^^^ and Asp^®^ are also of importance for expression of 
factor B-dependent proteolytic activity. In contrast, the Pro 
residue at position 188 has no apparent functional role and 
likely serves as spacer between structurally crucial residues. 

C3 Cleavage Assay — Decrease of the factor B hemolytic ac- 
tivity could reflect a defect of C3 and/or C5 cleavage. The effects 



382 



Primary Specificity Pocket of Factor B 



D187 Mutants 



o 



c 

XI 

la 

O 

< 




D226N D226N/N189D D226G D226G/N189D wt B 



wt B 
D187A 
DI87N 
DI87Y 



wt B 
N189A 
N189D 
N189K 



wtB 
D226N 

D226N/N189D 
D226G 

D226G/N189D 



Factor B, ng/ml 

Fig. 3. Assembly of solid-phase CoVFBb complex by wt and 
mutant factor B. Microliter plates were coated with CoVF (10 /ig/ml). 
Serial dilutions of wt and mutant factor B in culture supematants of 
transfected COS cells were added and incubated with factor D (1.5 
^ig/ml) at 37 'C for 2 h. CoVF-bound Bb fragments were detected by 
using rabbit anti-human Bb IgG and goat anti-rabbit IgG as detailed 
under "Experimental Procedures.'* Symbols are: A, wt B; D187A; 
T, D187N; ♦ , D187Y; B, ■, wt B; A, N189A; N189D; N189K; C, 

wt B; D226N, O, D226N/N189D; D226G; V, D226G/N189D. 



of the mutations on C3 proteolytic activity were assessed by a 
direct cleavage assay. Wt factor B and selected mutants were 
permanently expressed in CHO cells and purified. Fluid-phase 
C3 convertases were formed with CoVF in the presence of 
factor D. Conversion of C3 to C3a and C3b was assessed by the 
appearance of the a' chain of C3b on SDS-PAGE (Fig. 6). As 
shown, under the experimental conditions used, wt factor B 
converted 45% of a to a* chain, while there was no conversion 
observed in controls not containing CoVF and factor D. The 
N189A mutant demonstrated 37% of wt proteolytic activity. 
This is consistent with the expression of 29% of wt hemolytic 
activity by this mutant (Fig. 5). As expected from the lack of 
hemolytic activity, there was no detectable C3 cleavage by the 
D226N and D226N/N189D mutants even after prolonged expo- 
sure of the film. However, there was trace amount of a chain 
cleavage by the N189D mutant, seen more clearly after long 
exposure of the film. The C3 cleavage study demonstrated that 
at least for the factor B mutants tested loss of hemolytic activ- 
ity could be attributed to loss of proteolytic activity for C3. 

Esteroly tic Activity — Because C3 is a large protein substrate, 
extensive molecular contacts with C3b-bound Bb are probably 
required for its proteolysis. Hydrolysis of small synthetic thio- 
ester substrates containing Arg at the Pj site could provide 
further insights into substrate recognition. In the present 
study we chose Z-Lys-Arg-SBzl as substrate because it was 
shown to be the most reactive among the Arg-containing 03 
or C5-like substrates tested by Kam et al. (14). The catalytic 
efficiency ik^j^JK,„) of recombinant wt factor B was 1135 



CoVF - + + - + + - + +- + + . + + 
D 300 300 2 300 300 2 300 300 2 300 300 2 300 300 2 



97.4 — 



66.0 — 




B 



Bb 



Fig. 4. Cleavage of CoVF-bound factor B by factor D. PSJMet- 
labeled wt and Asp'^'^* factor B mutants secreted by transiently trans- 
fected COS cells were incubated with two different concentrations of 
factor D in the presence of 5 ^tg of CoVF for 2 h at 37 *C or with the high 
concentration of factor D in the absence of CoVF as control. After 
incubation, immunoprecipitation was performed by using a rabbit anti- 
h\mian Bb IgG and S. aureus protein A. Immunoprecipitates were 
washed and subjected to 7.5% SDS-PAGE and autoradiography. Posi- 
tions and molecular mass of marker proteins are given on the left. 



s~^ (Fig, 7) which is similar to the 1370 value reported 

previously for native factor B (14). The recombinant B-SP had 
^cat/^m of 198 s~^, which is 5.7 times lower than that of 
intact factor B. Measurement of individual kinetic parameters 
showed that the decreased k^^JK^^ of B-SP was mainly due to a 
4-fold increase in K^^ Of the mutants tested, D226N showed 
50-fold slower catalytic rate than wt factor B. However, place- 
ment of a negative charge at position 189 on the D226N back- 
ground partially restored esterolytic activity. As shown, the 
k^JK^ of the double mutant D226N/N189D was about 10-fold 
higher than that of D226N. As indicated by the lower than wt 
factor B A^^t unaltered K^^ decreased catalytic efficiency of 
these two mutants could be directly attributed to the decreased 
catalytic rate. These results strongly suggest that the nega- 
tively charged Asp^^® determines binding specificity and cata- 
lytic efficiency for the substrate Z-Lys-Arg-SBzl. Substitutions 
of Asp or Ala for Asn^^^ in N189D and N189A caused 2.7- and 
6.6-fold lower activity, respectively. Although N189A factor B 
had slightly lower esterolytic activity than N189D factor B, it 
had substantially higher proteolytic activity for C3 (Fig. 6). Our 
findings demonstrated that in addition to Asp'^^^, Asn*^^ also 
participates in substrate recognition and in determining spec- 
ificity for C3. Apparently, the structural configuration of resi- 
dues Asp^^^ and Asn^®^ of factor B is critical for recognition and 
cleavage of C3 and C5. 

DISCUSSION 

Determination of the structure of the SP domain of factor B 
revealed a number of novel insertions and deletions compared 
with typical SPs and also certain unique structural features of 
the catalytic apparatus, especially in the primary specificity 
pocket (data not shown). In the present study, mutational 
analysis of factor B residues in and around the primary speci- 
ficity pocket was performed to investigate structural correlates 
of substrate recognition at the site. The results are discussed 
in light of the large amount of available information on SP 
specificity. 

Our results clearly demonstrate that Asp^^® of factor B is a 
critical structural determinant for substrate binding and catal- 
ysis, substituting for Asp^^^ of other SPs with trypsin-like 
specificity. Functional analysis of the D226N mutant provided 
the most clear-cut results. The observed loss of esterolytic and 
proteolytic activity of this mutant could be attributed solely to 
a catalytic defect resulting fi-om inappropriate engagement of 
the Pj-Arg in the Sj site, while other functional sites necessary 
for the proteolytic activation and substrate binding appeared to 
be well preserved. A sharp 50-fold decrease in catalytic rate 
(^cat) indicates that a negative charge at the bottom of the 



Primary Specificity Pocket of Factor B 



383 



Fig. 5. Hemolytic activity of factor 
B mutants. EC3b (1.5 x 10') were incu- 
bated with serial dilutions of wt and mu- 
tant factor B in culture medium of trans- 
fected COS cells, factor D (12.5 ng), and 
properdin (125 ng) at 30 "C for 30 min. 
Hemolysis was allowed to occur at 37 *C 
for 1 h after addition of 1:40 dilution of 
guinea pig serum in EDTA buffer. For 
each mutant specific hemolytic activity 
(units/^g) was calculated and normalized 
to that of wt B. Each bar represents the 
average ± S.E. of the results of at least 
three separate experiments performed in 
duplicate. 



< 

e 

E 
at 

OQ 
> 



1.0 - 



0.5 - 



0.0 




D226N/ 

noB wtB N189A N189D D226N ^ 1390 C3 only 

1 1 I 



CoVF/D 
130- 

72- 



+ - 



+ 



- + 



1 
+ 



+ 



Fig. 6. Proteolytic activity of C3 convertases formed by CoVF 
and wt or mutant factor B, Wt or mutant factor B (50 ng) and C3 (75 

ng) were incubated for 1 h at 37 with ( + ) or without (-) CoVF (150 
ng) and D (12.5 ng). Aliquots of the reaction mixture were analyzed on 
7.5% SDS-PAGE under reducing conditions. C3 polypeptide chains 
were detected on Western blots by using a goat anti-human C3 IgG. 
Positions and molecular mass of marker proteins are shown on the left. 
Positions of a, a', and j3 chains of 03 are given on the right. 

primary pocket is essential for efficient catalysis, but not for 
overall substrate binding affinity, because the is not altered 
by the Asn substitution (Fig, 7). Apparently, hydrogen bond 
formation of the P1-P3 residues to the nonspecific substrate- 
binding site, Ser-Tiy-Cily^^'^"^^®, and hydrophobic anchoring of 
the Pg and P3 side chains to 83 and S3 pockets, respectively, 
provide sufficient binding force. Also it seems likely that Asn^^^ 
provides additional binding energy, probably by hydrogen 
bonding with P^-Arg. However, positioning of the scissile bond 
relative to Ser^^'' and the oxy anion hole through the putative 
hydrogen bonds may differ from that effected by the direct ionic 
contact made by Asp^^® in wt factor B. Replacing Asp^"^^ with 
Asn affected equally esterolytic and C3 proteolytic activity, 
although D226N factor B could form a CoVFBb complex. In a 
recent report Hourcade et al. (30) also found that substitution 
of various residues (Asn, Ala, Ser, and Tyr) for Asp^^® caused 
severe reduction in proteolytic activity despite normally assem- 
bled C3bBb complex. It is of special interest that the conserv- 
ative substitution of Glu for Asp^^® also abrogated C3 proteo- 
lytic activity. This observation suggests that accurate 
positioning of the carbonyl group of P^-Arg of C3 relative to the 
nucleophilic Ser^^^ O-7 and oxyanion hole can only be achieved 
by the native residue Asp^^®. A corresponding trypsin mutant, 
D189E, displayed 2-3 orders of magnitude decrease in catalytic 
efficiency {k^.^JKf^), associated with a 40-fold shift in the pref- 
erence from Arg to Lys substrates relative to wt trypsin (31). 
Apparently, the additional methylene group distancing the car- 
boxylate of trypsin D189E firom the peptide backbone within 
the narrow pocket impeded the proper positioning of the side 
chain of Arg, which is longer and larger than that of Lys. The 
loss of C3 catalytic activity by D226E factor B (30) can probably 
be attributed to a similar spatial effect. 

Another structural characteristic of the Sj pocket of factor B 
is a hydrogen bonding network formed by the carboxyl oxygens 




■ 



WtB B-SP N189A NI89D D226N I>226Ny 

M89D 

Fig. 7. Hydrolysis of synthetic thioester substrate by wt and 
mutant factor B and the factor B serine protease domain. Puri- 
fied wt or mutant factor B or recombinant B-SP (113-200 nM) was 
incubated with Z-Lys-Arg-SBzl at concentration of 0.08—0.8 mM. Hy- 
drolysis was measured at 25 'C in the presence of EUman's reagent 
5,5-dithio-bis-(2-nitrobenzoic acid) used as a chromogen of hydrolysis. 
Kinetic parameters were derived from Lineweaver-Burk plots. The 
values of individual parameters are the average ± S.E. of at least three 
independent determinations. 

of Asp^^^ and pocket residues Asn*^^ Thr^®°, and Arg^^^ (Fig. 
2). This effectively reduces ionic bonding potential available for 
making contacts with P^-Arg of the substrate. On one hand, 
this distinct feature could possibly explain the overall low 
esterolytic activity of factor B, Bb (12-14), and B-SP (Fig. 7). 
On the other hand, it implies the need for additional bonding 
between Pj-Arg and other pocket residues. The side chain of 
Asn^^^ faces the carboxyl of Asp^*"^^ from the opposite wall and 
occupies a central position at the bottom of the specificity 
pocket. Although the position of the Asn^*^ side chain is about 
0.5-1.0 A lower than that of Asp'"^^^, it appears accessible to the 
substrate. Our results indicate a supporting role for Asn*®® in 
substrate recognition and catalysis. Substitution of Ala, Asp, or 
Lys at this position caused substantial reduction or abrogation 
of hemolytic activity, which paralleled a similar reduction in 03 



384 Primary Specificity 

proteolytic activity (Figs. 5 and 6). The Ala substitution caused 
a decline in synthetic substrate binding affinity (K^) and cat- 
alytic efficiency (k^^^/K^), which strongly indicates participa- 
tion of Asn^^^ in substrate recognition. The amine group of the 
Asn*^^ side chain may mediate Pi-Arg binding through a hy- 
drogen bond. Absence of this potential binding force may com- 
promise accurate register of Pj-Arg of C3 for catalysis. Substi- 
tution of a charged residue. Asp or Lys for Asn^^® in N189D and 
N189K, respectively, abrogates C3 proteolytic activity of the 
C3- or CoVF-bound Bb. Interestingly, the N189D mutant re- 
tains substantial esteroljdic activity toward the synthetic sub- 
strate. These results suggest that the reconstructed Sj pocket, 
with free carboxyls at positions 226 and 189, despite its altered 
geometry could register to the His^^-Ser^^'' dyad, the Arg bond 
of the synthetic substrate but not that of C3. The free leading 
or leaving group of the synthetic substrate may account for the 
observed binding flexibility. 

C2 and factor B have identical proteolytic specificity for 
single Arg peptide bonds of C3 and C5 so that their substrate- 
binding sites can be presumed to be very similar in geometry 
and chemical nature. Thus, it is not surprising that C2 has Asp 
and Ser at positions 226 and 189, respectively (Fig. 1). Besides 
factor B and C2, an acidic residue is also present at position 226 
in a few additional members of the chymotrypsin family, 
namely fiddler crab collagenase (cCOLL) (32), human cathep- 
sin G (CATG) (33), protease 3 (hPR03) (34), and neutrophil 
elastase (hnELA) (35). In contrast to C2 and factor B these 
serine proteases display relatively broad substrate specificity. 
cCOLL and CATG recognize not only basic but also large hy- 
drophobic side chains (32, 36). The Arg^ys substrate prefer- 
ence is mainly attributed to the presence of Asp^^®/Gly^®® in 
cCOLL and of Glu^^^/Ala^®^ in CATG within the Si pocket. The 
large and flexible Sj pocket in cCOLL allows this enzyme to 
adjust to different shapes of the side chain. Removal of the 
negative charge from the cCOLL S^ pocket in the D226G mu- 
tant resulted in a significant decrease of catalytic efficiency 
toward Arg/Lys substrates (37). Similarly to Asp^^^ in factor B 
and cCOLL, the corresponding Glu^^^ in human CATG has 
only one carboxyl oxygen available for substrate binding (33). 
This may be responsible for the relatively slow catalysis of 
substrates with P^-Lys or Arg. However, the presence of a 
negatively charged residue at position 226 is not a sufficient 
condition for specificity for basic residues. Neither hPR03 nor 
hnELA, both of which have an Asp^^®, recognizes a Lys or 
Arg-Pj residue. The two enzymes display close similarity of 
their S^ sites and cleave after small mostly hydrophobic resi- 
dues, such as Leu/Ile (hnELA), Ala/Ser (hPR03), and Val/Met 
(hnELA and hPR03) (38). The presence of He and Val at posi- 
tion 190 of hPR03 and hnELA, respectively, seems partially 
responsible for their substrate specificities. In hnELA, loss of 
specificity for basic residues has been attributed to inaccessi- 
bility of Asp226 that is shielded by Val*^*' and Val^'^. Similarly, 
Asp22e of hPR03 is also shielded by Ile^^^ and Val Taken 
together, the data indicate that Arg/Lys substrate specificity is 
structurally determined not only by the presence but also by 
the accessibility of an acidic side chain at the base of the 
specificity pocket, positioned either at 189 or 226. The carboxyl 
oxygens of Asp^^® or Glu^^® seem less available to substrate 
than those of Asp^^® because of participation in hydrogen- 
bonding networks with residues on the wall of the pocket. This 
appears to be a distinct feature observed in factor B, the neu- 
trophil elastases, and cCOLL. 

Structural and functional consequences of altering the 
Asp^®® of trypsin have been examined by site-directed mu- 
tagenesis, kinetic, and crystallographic analysis (39). The neg- 
ative charge was relocated to the opposite wall of the binding 



Pocket of Factor B 

pocket in rat trypsin mutant D189G/G226D. Kinetic analysis 
showed that, compared with wt trypsin, this relocation of the 
negative charge caused 10^- and 4.5 X 10^-fold decrease in 
catalytic efficiency ik^j^JK^) toward P^-Arg and -Lys containing 
substrates, respectively. The decrease resulted from a much 
sharper decUne in A gat for the Arg than the Lys substrates, 
whereas the binding affinity {K^ for both substrates was 
equally reduced. The crystal structure of D189G/G226D trjrp- 
sin in complex with inhibitors showed that in its new position. 
Asp interacts extensively with other residues in the pocket 
through hydrogen bonds, which greatly reduce its negative 
charge potential. Similarly to trypsin D189G/G226D, the na- 
tive Asp^^® of factor B forms hydrogen bonds and this correlates 
with the low binding affinity and overall low catalytic efficiency 
toward P^-Arg/Lys peptide substrates (12-14). Re-constructing 
the pocket of factor B in the D226N/N189D mutant caused 
complete loss of hemolytic and C3 proteolytic activity (Figs. 5 
and 6), although esterolytic activity toward the P^-Arg thio- 
ester substrate was partially retained (Fig. 7). The kinetic 
analysis showed that the 80% reduction in esterolytic activity 
{k^^JK„^ was almost entirely due to reduction in k^^^, whereas 
the was not affected. Thus, the exact location of the nega- 
tive charge at base of the Si site and particularly its spatial 
relationship to the His^^-Ser'^^ dyad and the oxyanion hole, 
which is altered in trypsin D189G/G226D and factor B D226N/ 
N189D, are especially critical for efficient catalysis. 

In an effort to directly compare factor B to trypsin, a Gly 
residue was substituted at position 226 either alone (D226G) or 
in combination with the N189D mutation (D226G/N189D). Nei- 
ther mutant had hemolytic activity. However, loss of hemol3^ic 
activity could not be attributed exclusively to defective sub- 
strate recognition at the S^ site because the ability of these 
mutants to participate in the assembly of the C3 convertase 
was also affected (Figs. 3 and 5). Binding of the mutants to 
CoVF and their sensitivity to factor D cleavage was substan- 
tially decreased indicating conformational changes near or at 
the C3b/CoVF-binding sites, which are presumed to be distal to 
the mutation sites. Because overall folding of the polypeptide 
chain and the conformation of antigenic epitopes appeared 
unaffected, the conformational alteration of the C3b-binding 
site must be subtle, albeit functionally significant. At present it 
is not clear how the catalytic center relates spatially to the 
C3b/CoVF-binding sites. Hourcade et al. (30) also described a 
conformational change at a site distal from the mutation in the 
F227A mutant (30). The mutant was cleavable by factor D, but 
cleavage did not promote the conformational change to a high 
affinity C3b-binding proteolytically active state, which charac- 
terizes wt factor B. The Bb fragment of this mutant was rec- 
ognized by a Bb-specific mAb at much lower efficiency than the 
wt counterpart. As viewed in the structure of B-SP, the 
RDFHIN^^^''^^^ segment forms an extended internal )3-strand, 
which is buried within the protein core. Substituting Ala for 
Phe at po.sition 227 might destabilize the core, affecting the 
conformation of the surface epitope recognized by the Bb-spe- 
cific mAb (30). This epitope is probably located near the 
RDFHIN^'^^*'^^" segment and is only reactive in Bb perhaps 
because it is sterically hindered by the Ba region of intact factor 
B or because it undergoes a conformational change upon cleav- 
age/removal of Ba. Our D226G mutants might have conforma- 
tional change(s) within the same region. However, the relation- 
ship between the possible conformational change of the 
antigenic epitope and that of the C3b-binding site is still 
unclear. 

It is of interest that the RDFHIN^^^^^-^^^ motif is found in 
factor B and C2 of most animal species, but is absent from all 
other complement enzymes (1) as well as from other SPs of the 



Primary Specificity Pocket of Factor B 



385 



large chymotrypsin family (40, 38). This underlines the funda- 
mental role of Asp^^® in the function of factor B and C2 in 
complement activation. Therefore, the native conformation of 
Asp^^^ and Asn^^^ or Ser^*^ within the Si pocket of factor B and 
C2, respectively, constitutes one of the structural determi- 
nants, which have evolved to optimize the highly specific C3/C5 
cleavage. However. C3/C5 recognition and hydrolysis require 
more extensive enzjone-substrate contacts than interaction of 
the side chain of Pj-Arg with residues of the Sj site. The 
disparity in catalytic activity toward C3 and dipeptide sub- 
strates of N189D and D226N/N189D factor B (Figs. 6 and 7) 
probably reflects the complexity of the interaction between 
C3b-bound Bb and its natural substrates, C3 and C5. 

In the present study, we correlated the crystal structure of 
B-SP to the detailed mutational analysis of the factor B Sj 
pocket. The resulting information contributes to current under- 
standing of the structural basis for factor B and C2 substrate 
specificity and catalysis. Such knowledge is crucial for design- 
ing highly specific inhibitors that could have therapeutic po- 
tential for complement-mediated human diseases. 

Acknowledgments — We express our appreciation to Xiao Ying Liu 
and Yuling Dai for excellent technical assistance. 

REFERENCES 

1. Arlaud, G. J., Volanakis, J. E., Thielens, N, M., Narayana, S. V. L,, Rossi, V., 

and Xu, Y. (1998) Ac/t;. Immunol. 69, 249-307 

2. Volanakis, J. E. (1998) in The Human Complement System in Health and 

Disease (Volanakis, J. E., and Frank, M. M., eds) pp. 9-32, Marcel Dekker, 
Inc., New York 

3. Cohen, G. H.. Silverton, E. W., and Davies. D. R. (1981) J. Mot. Biol. 148, 

449-479 

4. Graf. L., Craik, 0. S., Pathy, A., Roczniak, S. Fletterick, R. J., and Rutter. W. J. 

(1987) Biochemistry 26, 2616-2623 

5. Gr^f, L., Jancsd, A., Szil^gyi. L., Heg>'i, G„ Pint6r, K., NSray-Szabtf, G., Hepp, 

J., Medzihradszky, K., and Rutter. W. J. (1988) Proc. Nail. Acad. Set. 
U. S.A. 85, 4961-4965 

6. Schellenberger, V., Braune, K., Hofmann, H.-J., and Jakubke, H.-D, (1991) 

Eur. J. Biochem. 199, 623-636 

7. Craik, C. S., Largman, C, Fletcher, T., Roczniak, S., Barr, P. J., Fletterick, R., 

and Rutter, W. J. (1985) Science 228, 291-297 

8. Jing, H., Macon, K. J., Moore, D., Lawrence, J. D., Volanakis, J. E., and 

Narayana, S. V. L. (1999) EMBO J 18, 804-814 

9. Volanakis, J. E. (1989) in Year Immunol Cellular, Molecular and Clinic 

AspecU (Cruse, J. M., and Lewis, R. E., Jr., eds) Vol. 4, pp. 218-230, S. 
Karger, Basel 

10. Pangburn, M. K., and Muller-Eberhard, H. J. (1986) Biochem. J. 235, 723-730 



11. Cooper, N. R. (1975) Biochemistry 14, 4245-4251 

12. Ikari, N., Hitomi, Y., Niinobe, M., and Fujii, S. (1983) Biochim. Biophys. Acta 

742, 318-323 

13. Caporale, L, H., Gaber, S-S., Kell, W., and Gotze, O. (1981) J. Immunol. 126, 

1963-1965 

14. Kam. C.-M., McRae. B. J., Harper, J. W., Niemann, M. A., Volanakis, J. E.^ and 

Powers, J. C. (1987) J. Biol. Chem. 262, 3444-3451 

15. Horiuchi, T., Kira, S., Matsumoto, M., Watanabe. L, Fujita, S., and Volanakis 

J. E. (1993) Mol. Immunol. 30, 1587-1592 

16. Zoller, M. J., and Smith, M. (1983) Methods EnzymoL 100, 468-500 

17. Kunkel, T. A. (1985) Proc. Natl. Acad. Sci. U. S. A. 82, 488-492 

18. Tabor, S., and Richardson, C. C. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 

4767-4771 

19. Itakura, K., Rossi, J. J., and Wallace, R. B. (1984) Anna. Rev. Biochem. 53, 

323-356 

20. Agrawal, A., Xu, Y., Ansardi, P., Macon, K. J., and Volanakis, J. E. (1992) 

J. Biol. Chem, 267, 25353-25358 

21. Kim, S., Narayana, S. V. L.. and Volanakis, J. E. (1994) Biochemistry 33, 

14393-14399 

22. Ueda, A., Kearney, J. F., Roux, K. H., and Volanakis, J. E. (1987)./. Immunol. 

138, 1143-1149 

23. Tuckwell, D. S., Xu, Y., Newham, P., Humphries, M. J., and Volanakis, J. E. 

(1997) Biochemistry 36, 6605-6613 

24. Circolo, A., Nutter, T. B., and Strunk, R. C. (1997) in Complement, a Practical 

Approach (Dodds, A. W., and Sim, R. B., eds) pp. 199-221, Oxford 
University Press, Oxford 

25. Kulics, J., Circolo, A., Strunk, R. C, and Colten. H. R. (1994) Immunology 82, 

509-515 

26. Volanakis, J. E., and Macon, K. J, (1987) Ana/. Biochem. 163, 242-246 

27. Gresham, H, D., Matthews, D. F., and Griffin, F. M., Jr. (1986) Ami/. Biochem. 

154, 454-459 

28. Rapp, H. J,, and Borsos, T, (1970) in Molecular basis of Complement Action 

(Rapp, H. J., and Borsos, T., eds) Century Crofts, New York 

29. Xu, Y., and Volanakis, J. E. (1997) J. Immunol. 158, 5958-5965 

30. Hourcade, D. E., Mitchell, L. M., and Oglesby, T. J. (1998) J. Biol. Chem. 273, 

25996-26000 

31. Evnin, L. B., Vfisquez, J. R., and Craik, C. S. (1990) Proc. Natl. Acad. Sci. 

U. S. A. 87, 6659-6663 

32. Tsu, C. A., Perona, J. J.. Schellenberger, V., Truck, C. W., and Craik, C. S. 

(1994) «/. Biol. Chem. 269, 19565-19572 

33. Hof. P., Mayr, 1., Huber, R,. Korzus, E., Potempa, J., Travis, J., Powers J. C, 

and Bode, W. (1996) EMBO J. 15, 5481-5491 

34. Fujinaga, M., Chemaia, M. M., Halenbeck, R., Koths. K.. and James, M. N. G. 

(1996) J. Mol. Biol. 261, 267-278 

35. Navia, M. A., Mc Keever, B. M., Springer, J. P., Lin, T.-Y., Williams, H. R., 

Fluder, E. M., Dom, C. P., and Hoogsteen, K. (1989) Proc. Natl. Acad. Sci. 
U. S. A. 86, 7-11 

36. Polanowska, J., Krokoszynska, 1., Czapinska, H., Watorek, W., Dadlcz, M., and 

Otlewski, J. (1998) Biochim. Biophys. Acta 1386, 189-198 

37. Tsu, C. A., Perona, J. J., Fletterick, R. J., and Craik, C. S. (1997) Biochemistry 

36, 5393-5401 

38. Czapinska, H., and Otlewski, J. (1999) Eur. J. Biochem. 260, 571-595 

39. Perona, J. J., Tsu, C. A., McGrath, M. E., Craik, C. S., and Fletterick, R. J. 

(1993) J. Mol. Biol. 230, 934-949 

40. Greer, J. (1990) Proteins: Struct. Funct. Genet. 7, 317-334 



Exhibit 42 




BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMTJNICATIOIS 219, 806-812 (1996) 
ARTICLE NO. 0315 



I?: 



V^fXS?ili^^.---Vol. 219. No. 3 

^i^jr^v^ / ;ine express 



Complementary DNA Cloning and Sequencing of Rat Enteropeptidase^l^ 

and Tissue Distribution of Its mRNA 




Naohisa Yahagi,* Masao Ichinose.*'' Masashi Matsushima.* Yasuo Maisubara,* 
Kazumasa Miki,* Kiyoshi Kurokawa,* Hiroshi Fukamachi,t Kosuke Tashiro»t 
Koichiro Shiokawa.t Takeshi Kageyama.J Takayuki Takahashi,§ Hideshi Inoue.l and 

Kenji Takahashif 

*First Department of internal Medicine, Faculty of Medicine and f Zoological Institute, Faculty of Science. University 

of Tokyo, Tokyo 113, Japan; ^Department of Molecular and Cellular Biology, Primate Research institute Kyoto 
University, Inuyama 484, Japan: §Division of Biological Science, Graduate Sciiool of Science, Hokkaido University. 
Sapporo 060, Japan: and ILxiboratory of Molecular Biochemistry. Molecular Science Division, School ofUfe Science. 

Tokyo University of Pharmacy and Ufe Science. Tokyo 192, Japan 

Received January 19, 1996 

A cDNA clone encoding enteropcpiidase (EC 3,4.21,9). a key enzyme for the conversion of (rypsinogen to 
trypsin, was isolated from a rat duodenal mucosa cDNA library. Sequence of the 3585 base pair clone predicted 
that enteropeptidase is synthesized as a single-chain precursor form, proenieropcpiidase, consisting of 1058 
amino acid residues with an internal signal sequence (51 residues) and is then processed into the mature enzyme 
consisting of three different peptide chains, i.e., mini, light and heavy chains, not the previously reported 
two^hain enzyme. The structure of enteropeptidase is relatively conserved among different species and the rat 
enteropeptidase is 24 and 39 amino acids longer than the porcine and human ones, respectively. Northern blot 
analysis of RNAs from normal nit tissues revealed that the enteropeptidase mRNA of around 4.4 kb in size was 
' - expressed only in the duodenal mucosa, and high proteolytic activity of the enzyme was detected in the 'proximal 
small intestine. Additional analysis of the RNAs by RT-PCR revealed that a low level of the mRNA was also 
expressed in the other parts of the small intestine, i.e.. jejunum and ileum. These results indicate that the 
biosynthesis of enteropeptidase takes place mainly in the proximal small intestine, the duodenum, and the 
importance of the region in the physiology of intestinal protein digestion regulated by the enzyme is suggested. 
Furthermore a faint signal of the mRNA was also detected in the stomach, colon and brain in which the existence 
of trypsin-like serine proteases were reponed. The significance of the low level expression of the gene is unclear, 
but the potential peptide-processing function of the enzyme in these tissues is also suggested. o 1996 Aca»icmic 

Press. Inc. 



r*T. 



\ Tissue prepai 
jSfe- Wistor siram m 
phosphate buffc 
f^'S the guanidium 
■^S*; o!igo(dT>-ccllul 
^^Ty • n uoromctricall) 
nn:?htyiamidej. * 
Isolation and 
for the prcparaii 
Hoffman (10). y 
according to the 
of lambda ZAP 
coli strain XL- 1 
enicropcptidase 
purified phages ' 
manufacturer an 
method on both : 
Ir.'/i. a thermal 
t.'zRNA detect! 
subjected to cle 
membrane filter 
stringency condi 
of ADNA genera 
(5' primer. 5'-A 
mcnt. 49lbp) . 
TAGCCCATGyft 
and purified by ) 
to cDNA and th 
J.-":m) under iht 
CvMdiiions. the □ 
PCR product w 
staining. 



Enteropeptidase (Enterokinase EC 3.4.21,9) was initially recognized as an intestinal factor which 
activates the latent enzymes in pancreatic fluid. Later the enzyme was proved to be involved in the 
conversion of trypsinogen lo trypsin (1). leading to the activation of various pancreatic zymogens 
involved in the later stages of the digestive cascade. Therefore, enteropeptidase has been consid- 
ered to be a key enzyme in the intestinal protein digestion. Because of its medical and physiological 
importance, the enzyme has been purified from the small intestine of various species, including 
bovine (2), porcine (3) and human (4). In addition, their cDNA structure have recently been 
determined in these species by us and others (5-8). However, the details of the structure and 
function of the enzyme are still unclear now. For example, the number of the peptide chains 
composing the mature enzyme is differently reported depending on the species and the mechanism 
of the enzyme activation remains to be elucidated. Also unclear is the regulatory mechanism of its 
synthesis in the gastrointestinal tract. In order to clarify these problems and because the laboratory 
rat is a highly developed experimental model to study the physiology of the intestinal digestion, \vc 
attempted to characterize rat enteropeptidase. In this study, we determined the nucleotide sequence 



' Corresponding author. Fax: 03-3812-5063. 



0006-29 1 X/96 SI 8.00 

Copyright O 1996 by Acodemic Press. Inc. 

At] rigtus of rcproctucuon in an/ fonn reserved. 



806 



Approxim; 
teropeptidase 
clones, 50 clo 



5* 



p 

M 



I w 



FIG. 1. Restr 
instructed from 
described in Mai 
pBluescript. Line 
proentcropcpiida: 




opeptidase 



iubara,* 

ashiro,t 
!noue,^ and 



t-if^e*^ 



Vr. 



'Science, University 

h Institute. Kyoto 7 

7kkaido University, ■ 
loot of Life Science, 



trypsinogen to 
clone predicted 
sisiing of 1058 
mature enzyme 
iously reported 
:cics and the rat 
Northern blot 
I kb in size was 
in the proximal 
iRNA was also 
idicate that the 
tenum. and the 
ie is suggested. 
:h the existence 
gene is unclear, 

O IVV6 Acmknuc 



iTV . * 



inal factor which 
e involved in the 
reatic zymogens 
has been consid- 
nd physiological 
)ecies, including 
'e recently been 
he structure and 
: peptide chains 
J the mechanism 
nechanism of its 
se the laboratory 
lal digestion, we 
leotide sequence 



vol. 219. No. 3. 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 




if.- 



cDN A encoding rat ehterdpeptid^^ primary smicture of the enzymeVand ahalyz^ th^t 

gene expression in the rat digestive tract and various other organs. 

. MATERIALS AND METHODS 

• Tissue preparation, RNA isolation and assay of en^matic activities of entervpeptidase. All tissues were collected from 
Wtstar strain male adult rats (8 weeks old, Charles River Japan. Inc.). The excised tissues were washed with ice-cold 
phosphate buffered saline and were stored frozen in liquid nitrogen until use. From the tissues, total RNA was prepared by 
:V* the guanidium tsothiocyanate/cesiiim chloride density gradient ultracentrifugalion and. poly (A)*RN A was .selected by 
^^V' oligo(dT)-celluIose column chromatography. The proteolytic activity of cnteropeptidase in the tissue samples was measured 
f^rtr fluorometrically by a modified -method of Antonowicz et at.' (9), using a synthetic substrate [Gly*(Asp)4-LyS'^- 
n:::?htylamide). Unless otherwise specified, 2mM EDTA was included in (he reaction mixture in this study. 

isolation and characterization of the cDNA clone for rat cnteropeptidase. Rat duodenal mucosa polyCA)"" RNA was used 
for the preparation of a cDNA libmry. Double-stranded cDNA was synthesized according to the procedure of Gubler and 
Hoffman (10). After methylatJon of the internal fcoRI sites and addition of EcoRI linkers, the cDNAs were fractionated 
according to their size by agorose-gel electrophoresis. The cDNA larger than 1 .Skb in length was ligated into the fcoRI sites 
of lambda ZAP II vector (Stratagene. USA). The phages were packaged and recombinants were selected by plating on £ 
coli strain XL-I blue. Nyton filters that carried denatured recombinant DNAs were screened by [^^P]-labeled porcine 
enteropepiidase cDNA (7). The positively hybridized clones were identified and isolated by repeated purification. The 
purified phages were converted to the corresponding plasmid fonm utilizing the plasmid excision procedure provided by the 
manufacturer and were used as a. template for DNA sequencing. Sequencing was performed by dideoxy chain termination 
method on both strands of denatured plasmid cDNA inserts using a Taq dye terminator sequencing kit (Applied Biosystems. 
I .-.), a thcnmal cycler (model 480, Perkin Elmer Cetus), and a DNA sequencer (model 37 1 A. Applied Biosystems, Inc.). 

iV.RNA detection by Northern blotting and RT—PCR. 10 |xg of total RNA from various rat tissues were denatured and 
subjected to electrophoresis on a 0.66 M formaldehyde -agarose gel. After the RNA had been transferred to a nylon 
membrane filter, the fitter was hybridized with the -labeled full-length cDNA for rat cnteropeptidase under high- 
stringency conditions. The size of RNA was estimated by reference to the mobility of 18s and 28s rRNAs and fragments 
of ADNA generated by digestion with Hind 111. Primers specific for the amplification of the rai enteropepiidase heavy chain 
(5' primer. 5*-ATTTGATGATGCrmTTG-3'; 3' primer 5 ' - AGCriTGG I" I C 1 G G ATA AG -3 ' ; size of the amplified frag- 
,mcni, 491bp) and G3PDH (S' .primer, 5 '-TG AAGGTCGGTGTCA ACGG ATTTGGC-3 ';. 3'_ primer 5'-CA.TG-. 
TAGGCCATGAGGTCCACCAC-3') were synthesized with a DNA synthesizer (model 380A. Applied Biosystems. Inc.) 
and purified by gel filtration. For each reaction, I pig of poly(A)* RNA from representative tissues was reverse-transcribed 
to cDNA and the resulting cDNA was subjected to 20 to 40 cycles of PGR using Takara Toq DNA polymerase (Tokara, 
J;-nn) under the following conditions; 94'*C for 60sec. — » 48^*0 for 30sec. — » 74**C for 60sec. In the above-mentioned 
Cv'iiditions. the amplified signal derived from the genomic DNA encoding cnteropeptidase was around l.6kb in size. The 
PGR products were electrophoresed through a 1.0% agarose gel in IX TAE buffer and visualized by ethidium bromide 
staining. 

RESULTS AND DISCUSSION 

Approximately 5 x 10^ clones wepe. screened by hybridization with a full-length f>orcine en- 
teropepiidase cDNA. Over 500 clones were identified as positive for the probe. Among these 
clones. 50 clones hybridized positively with 0.6 kb EcoKV fragment representing the NHj-terminal 



Pst I 



5': 



£coR V 

_J 



H/nd 111 

L_ 



Fat\ 



Bamfi I 



H 



HInc II 



I 



3* 



Ikb 



PIC. 1. Restriction map and sequencing strategy of a rat cnteropeptidase cDNA clone (REK/f7). Deletion mutants 
i mstructed from subcloned fragments were used for nucleotide sequencing, and sequencing was done in both directions as 
described in Materials and Methods. Arrows indicate the direction and extent of sequencing of fragments subcloned in 
pBluescript. Lines indicate the 5'- and 3'-noncoding region, a closed tx>x indicates the putative internal signal sequence of 
proenteropepiidase. Open boxes indicate the coding region including the M. H and L-chains of mature cnteropeptidase. 

807 



I 



I 



r 



mi 

•Si hi 

Mr 

M. 

V.t'U- . 

|3ir:.. 



3> V ^ 

•t • 



Vol. 219, No. 3. 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS *- 



domain of porcine enierbpepiidasc' These "clones were- isolated by" repeated purification. -The 
restriction site map constructed for these clones revealed that their structures are basically the same 
and the nucleotide sequencing on bilateral ends disclosed a common nucleotide sequence. One 
clone (REK#7) was found to contain the entire coding region for rat enteropeptidase. The restric- 
tion map and the sequence analysis strategy of the clone is shown in Fig.l. The resulting nucleotide 
sequence and the deduced amino acid sequence of rat enteropeptidase are presented in Fig.2. The 
analyzed cDNA clone was 3585 base pairs (bp) long, including the 5'-noncoding region (-166bpX 
the coding nucleotide sequence (3,174bp) and the 3'-noncoding region (245bp). A typical poly- 
adenylation signal was present at the 3554th base pair position. The second methionine codon at 
nucleotides 167-169 in the open reading frame meets the criteria for the initiation site of the 
translation (11). Thus, the cDNA encoding rat enteropeptidase predicts a molecule of 1058 amino 
acids residues (Mr= 117,700). Recently, we purified the enzyme from porcine duodenal mucosa 
and structurally characterized it. In addition, we have cloned and analyzed the cDNA coding for the 
protein (7). The primary structures of the rat and porcine enzymes are relatively conserved; 77% 
identical in the nucleotide sequence and 71% in the encoded amino acid sequence. The comparison 
of the rat cDNA sequence with the porcine one indicated that the enzyme is originally synthesized 
as a single-chain precursor and processed into a three-chain enzyme rather than the heterodimeric 
enzyme previously reported in other species (2,3). The NHj-terminal sequences of the mini (M), 
heavy (H), and light (L)-chains are deduced to start at positions 53, 1 19, and 819, respectively, thus 
leading to the production of three chains consisting of 66 (Mr = 7,700), 700 (Mr = 77,700). and 240 
(Mr = 26,800) amino acid residues. There is a hydrophobic domain comprising 25 amino acid 
residues preceding the NH2-ierminus in the rat proenteropeptidase sequence; double underlined 
region from positions 19 to 43. Although there is one amino acid insertion (Ala at position 52) in 
the prepeptide sequence compared with other species (6,7). the hydrophobic segment is observed 
in common, probably serving as an internal signal sequence. While we were preparing the manu- 
script, the sequence of the cDNA encoding human enteropeptidase was reported, presenting the 
possibility of a two-chain structure of the human enzyme (8). However, it is noteworthy that in 
addition to the H and L-chains, a sequence similar to the rat and porcine M-chains is also observed 
in the human sequence. The homology of the region is particularly high (88% vs. porcine and 83% 
vs. human enzymes) compared with that in other regions (64—68% in the H-chain, 77—78% in the 
L-chain). Thus, it is highly probable that human enteropeptidase is also a three-chain enzyme. 
Among these three chains, the homology of the H-chain is the lowest due to insertions/delelions of 
variable length around the Ser/Thr-rich regions, potential O-Iinked glycosylation sites. The rat 
enzyme has 7 insertions (18 amino acids in total) and 50% of the inserted amino acids are Ser and 
Thr residues, which are probably involved in O-linked carbohydrate attachment. The rat enzyme is 
therefore considered to be the most O-linked carbohydrate-rich enteropeptidase among the previ- 
ously reported si>ecies. Furthermore, some of these inserted amino acids give rise to two additional 
potential N-glycosylation sites, leading to heavy glycosylation of the regibn. The number of 
potential N-linked glycosylation sites is variable depending on species (rat 20, human 18, bovine 
19 and porcine 22), but their positions are almost conserved. These carbohydrate moieties are 
presumably important to protect the enzyme from the access of other digestive proteases in the 
intestinal content. The variety of the glycosylation sites observed among species may somehow be 
related with the divergence in the environment in the intestinal lumen and physiology of digestion. 
The common basic structure of the catalytic domain of serine proteases is also observed in the rat 

FIG. 2. Nucleotide and deduced amino acid sequences of the rat enteropeptidase cDNA clone. Double underlined 
sequence indicates a putative internal signal sequence. Boxed domains with (M). (H) and (L) arc the deduced regions, 
corresponding to the M. H and L-chains of the mature enzyme, respectively. The underlined sequence at 665-802bp is the 
variable and Ser/Thr rich region, including 18 amino acid residues of insertions observed in the rat enzyme. PotentiaJ 
N-linkcd glycosylation sites ore indicated by closed boxes. 

808 




^iVol. 219. No 



,<:>.- 



• 9 

«a 

■ . ■ >■ 

• 7* 
BTI 

in 

T*7 

«» 

• «J 

Iff 

• It 

l>T 

tots 

ttti 

J<l 

>147 

«>> 
«lf 

• •9 

» »i • 

-■11 

• «!• 

• «» 

• »tl 

<«t 

ISOT 

TTT 
l«»t 

• rt 

3" • 



CCT AAT 

aim 



OAT 



err cat 

Ola Ola 



Ola mi» I 

OJM CAT « 
Pba Cir« t 



An Aap < 
AOA OAT C 

»• »ha J 

ATT rrc Q 



ala Tta « 
OAA TTT C 

c 



Ita A 
ATT TC* A 

Ola «aa r 

OAA acc c 

AAA ATO 9 



• 11 
tf tT 

CTJ 

31*1 
f (t 

ItfT 

»*.^: 
3: 



In !•« 0 

AOO TCT » 

TU A«v V 

T» 



■ft* Tyv A. 
CAT TAT a* 



Olm Ola 



Ala A«9 Ti 

OCT OAT 01 

Oly Tb« Ai 



crm Ala 1^ 

ran OCK 



CAT AAA TC 
>•>! CTA AAT CJ 
*>>* CAT AAT A* 



;eaPvCH communications': 

epeated piirifi cation. "The.^ 
ures are basically the same'S^ 
nucleotide sequence. One 
teropeptidase. The restric-;^ 
I. The resulting nucleotide ^ 
ire presented in Fig.2- The 
loncoding region (166bp)^ 
I (245bp). A typical poly- vr^'- 




^"5 



cond methionine codon ai]^ 
: the initiation site of the_\. 
a molecule of 1058 amino 
porcine duodenal mucosa .'\L^; 
d the cDNA coding for the '^'^ 
relatively conserved; 77% 
sequence. The comparison 
le is originally synthesized 
her than the heierodimeric 
equences of the mini (M), 
and 819. respectively, thus 

00 (Mr = 77,700). and 240 
romprising 25 amino acid 
]uence; double underlined 
ion (Ala at position 52) in 
hobic segment is observed 
were preparing the manu- 
IS reported, presenting the 
;r, it is noteworthy that in 

M-chains is also observed 
(88% vs. porcine and 839£» 
le H-chain, 77-78% in the 

1 so a three-chain enzyme, 
e to insertions/deletions of 
lycosylation sites. The rat 
;d amino acids are Ser and 
chment. The rat enzyme is 
>eptidase among the previ- 
give rise to two additional 
e region. The number of 
(rat 20. human 18. bovine 
carbohydrate moieties are 
digestive proteases in the 
: species may somehow be 
id physiology of digestion. 

is also observed in the rat 



:DNA clone. Double underlined 
tnd (L) ore ihe deduced regions, 
ted sequence al 665-S02bp is ihe 
ved in the rat enzyme. Poieniial 



l^yol. 219. No. 3, 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 



._ . e 



f S VAV *CO *CT V9f 



O&F U» SL* mm* 

00* ou err vet &vo mo 



«t fcM AXa Vkl Ur» 



CTO 



IM «%1 Umm mmr <Ur 1«* **» «U «U 



'■' »•> 

»>» 
; ITS 

lis 

• » 

Itt 

• S« 

t»tl 

»t 
llli 

3<1 

1>*^ 



«>7 
UXS 

«tt 

sai 

»»»» 

I«S 

I • 

>«»• 

<«» 
iiu 

tat 

a»rt 

TU 
»•> 

T«l 

astt 

>4tl 



any &r« aw mxm n« i^i An eoy *■« »w «w Mr vta 

on AAA Aoe c*» OA* otc an 009 rrr u» aa «cc ook ova m» tat *&t 

»v* Am Im «(• tes t9W Mm *9 Mm ldr»~V^ %mm Ala *«• ta* O&a Ola U* A«v I* »ta ca« Cm Am t>M «^ 

ccvAATmcAAacAAATicvcAoroaAVVtoAAAOvo — *" III III llin m m r" 




pw rr» iM «%• ttr v«m juv &*^9C^S5|cte r*» «%■ 



•w oiy aw A« 

m mi Tna TTB «T 



si« air 

AffT 



air 



•ir 



TOO ATA 



oir oi* 



cr* 



<Br Ala 



otr *«» 



air 



«lr Ar« 



*k« oir A*p aw aijr 

OA* «C* 



ai« Ai« 



V»l Ar« 



oir >ar* 



TAT 



»« tew Ui 
A*V «CA CAt TAT 



^1 



rw *■» oi* 



»w* car 



SI* 



AM Ola Vtel 



A OA* A*T TAT 



|ai« zla A«a am Ty« oi« t«« tl* 

M ATT AM AA* TA* (ML 



oi* V19 ai« 



Al* A*« Tta rv* Cy» Trw 



air r«* 



000 




ss* 

ASC 


n* 

ATO 


An 


«kl 
OTA 


»• 
ATA 


Oir 


ecT 




MA 


Xl* 
ATT 


ta« 

AAA 




OA* 


Ol* 


TAT 




A«» 

OAT 


OAT 


oir 


rr» 


nc 








mim 


Tte 




car 



u* air 



oir 



Tr« SI* oIt 



Trm Tto Ola 



err TCT TTA 



CTT CAT TTC 



r«C AAC VTA AAC ATT CAT 



Ola oi« Am vr* air am a** tct a«* Tr« oir oi* v*i 

CAA OAA AAC TAT OOS AOO AAT TOO AAC TAT 



era XI* Al* 



«•! Ol* 



OAA ACe ACA OAC AAA ATA 
>«* V*l «*1 9tm Am Al* Wbm Am A** *m Oir 



LA* S&* 
IT OAT ATA 



Oir XI* cr« u* Tr« oi* 'n 



Ol* u« r** Titr 
OAS CAT cer 



ey* tl* Vr» Am Iim a*) 

ATA TOO AAO .«TA AAT OCA GAO 



Vlil rr* Tte TM TM Tr* 

ceo ACO COT CCA CCA 



r** Am Ola Al* 



Oir ttr» 



XI* Ola 



AAT TAC 



Ol* A«a SI* Aaa *m «kl **1 Ol* T«l 
TT>C OAC TTA OAA AA9 ATC AAT OAT OTA OTV OAA OTT 



vai 





OAW 






AAA 






SI* 
ATT 










MO 


AAO 


Ol* 
OAA 


TTT 


CM* 




AAA 


OAV 




AM 

coc 
li* 

ATT 


n* 

TCA 


AAT 


Ol* 


v*l 


rot 


Ota 

Ml* 

CAC 



TAT 



Oir s^ oir 




as* rM Qr* oia 



oir 



V*& ox* n* Am si* U* A«» St* V«» ■!» tl* Ala Cr* Al* Ol* Wmm 



Am Ola Al* cy» v»i 

KA CAT TOT VTO 



Tte Ola 



t 



i«* b*^ air AM air OM Al* WSmjomwsS- 



UodU^iV^-lM rM SI* iM 

i>tCj»<iA?»<igiii ATO CCA ATA CTO 



m SI* tM ■*» Tte air oir oir av* vai am 

CCA ATA CTO VCT ACT 



oi< 



SI* fcM Th* FM Oaa Am Ola C»* Om Ol* AM >>** Si* t** %mm Ol* Cr* Am SO.* br* •*• Cy* Oir Ola 
ATA'CTA fTT /fr* TBP TTT ff TTB tfci TTTT HT TIT HV TTl TT 



bra aa* T»l Ttw Ol* l^ra «kl oir »va idT' 

AAA ATO OVO ACT 



• tl 
S<t1 

iri 

STSS 

••s 
asTt 

SIT 
ISTl 

f«S 
ISTl 

»B1 
SIST 



JJSS 
S«tt 

asii 



AM Oir AM Am' 



idri 111* v»i 

AAq lATC OTT 



1* v*i oir oir an am rum ai* Ala oar ai* tm 

ATC 



Pm Tm val Val Al* 



T»r fyr Am a«» 
T»T TM 



Xm Cr* ' V* Om Am Tax aar aa» Am V** AM T*l A** Al* Ala >!• Cr* V*l Tyw Am Am Aa* Wa AM 9wm 

OTO TCT OCT OCA CAC TOT OTO TAO AOA AOA AAT CTA OAT CCA 



ThM Am Tea 



AS* ▼*! AM Oir Am Oia Ba* Ola mmmfSSi^SS^aS aa* Tm Ola VU ▼*! A** AM «AS Wal Am Am SI* V*l XI* Am Fm 

OCA ore CTO ooc cro car ato caa TOoUjUB^CBtCMB tct cct caa ota ota aoa coo oro ore oat ca 



■la Tr* Am 1^1 
CAT TA* OAC AAA COA 



AM Trv SI* Ola TM XI* cr* faal 
OAT TAT ATA CAO CCT ATT TOT TTA CCC 




QlT Th« AM aa« cy* Ola Oir AM a*« Oir oir Vm 1<m aaa c»a ola Ol* am A*a Am V«* Ste 

OA OOA CCA TTA ATO TOO CAA 



*al oir Val rba- 



Oly Val Ola 



Icy* Ala fcM r«* AM Ola Oir V*S tyv Al* Am «*1 •** Ola rba 11* Ola «m SI* Ua a** 
TOT OCA CTO CCT AAT CAC CCT OOP Of A TAT Oce eOO ore TCA 00 TTT ATA OAA TOO ATC CAT AOe TTT CTA CAT frAO OCA tTT 



S 

Its 

as 
SSf 

Tl 
SSI 

104 
«TS 

US 
ST« 

ISS 

r?« 

sae 

Tfl« 

sax 
•«s 

ICS 

ss« 

s>« 
loss 

lis 
iito 

ISO 
!>•• 

stx 

ISIS 

4I« 
«4S« 



ItJS 



1S)0 

s>« 

ITIS 

. SSS 
ISIS 

■f4 
IPIS 

«!• 

»ei« 

■ 4S 

Stl0 

• SO 

saos 

Til 

ssos 

T4t 

asts 

TTS 
XSSI 

•OS 

ass* 

S4S 
8SSS 

BTS 
STta 

SOS 

asTs 

tl4 

atTS 

tss 

asTo 

1000 

aiss 

101a 
aass 



CAT AAA TOA 
CTA AAT CAT TTA AAT 
CAT AAT AAA 



TAA ACT A0T ATO OCO ACT OAO AOT 
AOA CAA TAA ATT AAA ATA AAT CT* TOA AAC ATA AAO TAA 





Be AAA 


ajts 


TAT Tl 


FT TAO 


14*4 


TTT CI 


to TTT 


atso 



vre TAT 



GAT TA* CA 



809 



c r 
"•I." 

1; 



'I 



> 

s. . 



Vol. 219, No. 3» 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS"'^ 



-L^hain. Consistent with the previous data indicating that the enzyme activity is attained by~ihe?^ 
. L-chain alone, the homology of the region is high among different species (77-78%). There is^^^ 
however, an insertion of 4 amino acid residues in the sequence next to the catalytic triad of serine^^^ W:. 



proteinases, whereas the three basic amino acid residues, important to keep the substrate specificity*^ 



it"- 



for the trypsinogen. are well conserved. 

TTie expression of the enteropeptidase gene in various rat tissues was examined by Northern bloi"j^ 
analysis using the cloned full-length cDNA as a prove. As shown in Fig'.3, a signal of 4.4 kb .t^l 



enteropeptidase mRNA was observed only in the duodenum, but riot in the other parts of gastro- 
intestinal tract from the esophagus to the colon and also not in other organs such as the brain, heart, 



lung, liver, kidney and spleen. Since the comparable signal for G3PDH mRNA was' observed in all 
RNA samples analyzed, it is evident that the paucity of the enteropeptidase itiRNA in the jejunum \^ 
and ileum was not caused by the degradation of the RNAs. There is a controversy as for the -vV^ 
distribution of enteropeptidase; some of the previous reports indicated the limited localization of "''t 
the enzyme in the duodenum (12). while others the distribution throughout the small intestine (9).~~^ 
Thus, to further measure low levels of enteropeptidase gene expression semiquantitatively, we '[': 
employed the RT-PCR method and selected a primer set and amplification conditions with high tz 
sensitivity and low background. Three PCR cycles were used for quantitative estimation. The .T 
RT-PCR result of the RNA samples used in the Northern blotting is shown in Fig.5. The PCR __2 
product had a molecular size of 0.5 kb corresponding to the expected product of 491 bp and was 
shown to hybridize with the rat enteropeptidase cDNA by Southern blotting (data not shown). A - 
strong signal was observed in the duodenum and also weak signals in the jejunum and ileum. The 
signal detected in the ileum at 34 cycles was weaker than that of the duodenum at 30 cycles. Thus, 
the mRNA level in the duodenum is considered to be at least 10 times higher than that in the distal 
pan of small intestine, the ileum end. These results indicate the gene expression of the enzyme 

along the entire small intestine, though-the level of the expression is low in the distal segment. ; 

Previous studies revealed relatively high enzyme activity throughout rat small intestine (9). The 
analysis of our samples by the same assay for the enzyme activity also gave essentially the same 
result (Fig.4/A). However, it was indicated that their method also measured the coexisting ami- 
nopeptidase activity together (13), By including 2mM EDTA in the reaction buffer, the activity of J. 

aminopeptidase could be completely diminished, while that of enteropeptidase was not much 

affected, at least 80% of the activity having remained (unpublished data). Thus, an approximate 
estimate for the enteropeptidase level could be obtained by the method used in the presence of 



Vol. 219. No. 3. 



5* 
I* 



FIG. 4. Entcn 
divided into 8 eq 
section (A: with 
percentage of the 



2mM EDTA 
indicates the p 
enzyme ac 
Tiiken togeth( 
dase is regula' 
place of the 
pancreatic sec 



Es St Du Je II Co Br He Lu U Kl Sp 



28s 



18s 



f 




G3PDH 

FIG. 3. Nonhem blot analysis of total cellular RNA from various rat organs. 10 fxg of total cellular RNA from the rat 
esophagus (Es). stomach (St), duodenum (Du). Jejunum (Je). ileum (II). colon (Co), brain (Br). heart'(He). lung (Lu). liver 
(Li), spleen (Sp) and kidney (Ki) were separated on a 1.0% denaturing/formaldehyde agarose gel. hybridized and washed 
under the condition of high stringency using the rat enteropeptidase cDNA as a probe. The lines on the left indicate the 
po'siijons of Uie 28s and 18s rit>osomal RNAs. The results of rehybridization of the filter with glycerolaldehyde-3-phosphatc 
dehydrogenase (G3PDH) cDNA are shown at the bottom. 



Zii vyvlvs 

12 evck's 

^'TG, 5. Enicn 
(hxit. jejunum (Je 
adju.^ted to the Si 
confirm the expo 
enteropeptidase H 



810 



I COMMUNICATIONS 



y -is attained by' the" 
77-78%). There is. 
ilytic triad of serine 
substrate specificity 

ed by Northern blot 
a signal of 4.4 kb 
her parts of gastro- 
1 as the brain, hearty* 
was observed in all 
ISA in the jejunum 
troversy as for the 
lited localization of 
small intestine (9).' 
liquantitatively, we 
^nditions with high 
ve estimation. The 
in Fig.5. The PGR 
of 491 bp and was 
data not shown). A 
lum and ileum. The 
at 30 cycles. Thus, 
lan that in the distal 
;ion of the enzyme 
the -distal segment.- 
I intestine (9). The 
ssentially the same 
ihe coexisting ami- 
iffer. the activity of 
ase was not much 
us, an approximate 
in the presence of 



^> Hi' 



Vol. 219. No. 3. 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 



Vk.'. 

5«r 



v.. 



A, 



llular RNA from the nil 
rt (He), lung (Lu). liver 
hybridized and washed 
on the Itfk indicate ihe 
olaldehydc-3-phosphaie 



I 

S 




2 3 4 5 6 
Intestinal eagments 



8 



2 3 4 5 6 7 
Intestinal segments 



8 



(A) 



(B) 



FIG. 4. Entcropeptidase activity along the rat small intestine. Small intestine from the duodenum to the ileum end was 
divided into 8 equal segments, and the activity in each segment was measured as described in the Materials and Methods 
section (A: without EDTA, B: with 2mM EDTA in the reaction, respectively). Value of each segment indicates the 
percentage of the enzyme activity when. that in the duodenum (segment No. 1) is regarded as 100%. 



2mM EDTA and the result of the measurements in the small intestine is shown in Fig.4/B. This 
indicates the presence of high enzyme activity in the proximal segment of the small intestine, while 
o enzyme activity was detected in the distal segment despite the high sensitivity of the method. 
Taken together, the above-mentioned results clearly indicate that the biosynthesis of enteropepti- 
dase is regulated region-specifically both at the level of transcription and translation and that main 
place of the synthesis is the proximal . segment of the small intestine, the duodenum, where 
pancreatic secretion join the intestinal contents. The distribution of the mRNA and the enteropep- 



(A) 



(B) 



(C) 



XII. .St. iFtK Jr. II. Cii. Ilr. 



IH fVl'Ics 



20 fvclcs 



cvcles 




K<«. Ilu. Jr. II. Cti. Hr. >JI. 



28 cvcles 



30 cvcles 



J 2 cvcles 




K-fc SI. tut. J... II. o.. III-. XII. 



40 cvcles 




FIG. 5. Entcropeptidase mRNA expression detected by RT-PCR in the rat esophagus (Es>. stomach (Si), duodenum 
Oju), jejunum (Jc). ileum (II). colon (Co) and brain (Br). The amount of each cDNA sample included in the reaction was 
adjusted to the same quantity according to the G3PDH mRNA expression. Three successive cycles were employed to 
confirm the exponential omplincation. Primers used for the amplification were as follows: (A) G3PDH. (B) and (C) rat 
enteropeptidase H*chain speciftc primer. 




811 





Vol. 219. No. 3, 1996 



BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS 



r J 



V: 



fl 



■ 1 



ddase activity strongly indicate the imponance of the proximal small intestine in the physiology of 
the intestinal protein digestion regulated by the enzyme. 

In addition, faint signals of enteropeptidase mRNA were also observed in the stomach, colon and 
brain at 40 cycles (Fig.5/C). The enzyme activity is undetectable in these organs and the physi- 
ological importance of these findings remain to be elucidated. However, these fmdings are ir;-^r- 
esting in context with the previous reports indicating the presence of trypsin-like serine proteai^es 
in these tissues (14, 15). Trypsin-like serine proteases are playing important roles in many bio- 
logical processes. Especially in human brain, they are considered to be involved in the pathogenesis 
of Alzheimer*s disease, playing a role in j3-amyIoid production (15). Thus, the observed distribu- 
tion of the mRNA may indicate a role of enteropeptidase in the processing of bioactive peptides by 
regulating the activity of trypsin-like proteases. 

REFERENCES 

1. Kuniiz, M. (1939) y. Cen. PhysioL 22, 429-446, 

2. Licpnieks. J. i.. and Light. A. (1979) y. Biol Chem. 254, 1677-1683. 

3. Baniiti. J.. Maroux. S.. Louvord, D.. and I3esnuelle. P. (1973) Biochim. Biophys. Acta 315, 147-161. 

4. Magee, A. I.. Gram. D. A. W., and Hcmion-Toylor, J. (1981) din. Chim, Acta 115, 241-254. 

5. LaVallie. E. R., Rehemtulla. A., Racie. L. A.. DiBloxio. E. A.. Fcrcnz, C, Grant. K. L.. Light. A., and McCoy. J. M. 
(1993) y. Biol. Chem, 268, 2331 1 -233 1 7. j 

6. Kiiamoio. Y.. Yuan, X.. Wu. Q.. McCourt, D, W.. and Sadler. J. E. (1994) Proc, Natl. Acad. Sci. USA 9I» 7588-7592. 

7. Mat5ushima. M.. Ichinosc. M.. Yohagi. N.. Kakci, N.. Tsukada. S., Miki. K., Kurokawa. K., Tashiro. K.. Shiokawo. K., 
Shinomiya. K., Umcyama. H.. Inooe. H.. Takahashi. T.. and Takahashi. K. (1994) J. Biol. Chan. 269, 19976^19982. 

8. Kiiamoto. Y.. Vcile. R. A.. Donis- Keller. H.. and Sadler. J. E. (1995) Biochemistry 34, 4562-4568. 

9. Anionowicz. I.. Hcsford. F. J., Green, J. R.. Grogg, P.. and Hardom. B. (1980) Clin. CItim. Acta 101, 69-76. 

10. Gubler. V.. and HofTman. B*. J. (1983) Cene 25, 263-269. 

1 1. Kozak. M. (1984) Nucleic Acids Pes. 12, 857-872. 

12. Eggcrmont. E.. Molla, A. M., Tytgat, G., and Rutgeerts. L. (1971) Acta Gastro-Enierotogica Beigica 34, 655-662. 

13. Jcntt. P.. Green. J. R.. and Lentze. M. J. (1987) Biochem. J. 241, 721-727. 

14. Jeohn. G. H.. Serizawa, S.. Iwamatsu, A., and Takahashi. K. (1995) J. Biol. Chem. 270, 14748-14755. 

15. -Wiegand. U.. Corbach, S., Minn.-A.. Kang. J,, and MUller-Hill. B. (1993) Cene 136, 167-175. : — 



BlOCHEMICA 
;^R-nCUE NO. 



Aki 

* Faculty 
06C 



Th 
this o 
the p 
acid i 
proic 
the n 
kina£ 
speci 
prote 
that 
rcprc 

Tlie ^ 
is comp 
served i 
bHLH-i 



* m 



812.. 




suggest* 
the maj 
(NCAN< 
c-myc V 
proteins 
fronn al 
HLA-p 
elemen 
upstrea 
shows 
GATT< 
transcri 
gene tc 
pressio 
48 KD; 
• to reco 
; N-myc 
W: H2TF1 
interfe; 
mecha 





Exhibit 43 



TiiE Journal of Biological Chemistry 

O 1998 by The American Society for Biochemistry and Molecular Biology. Inc. 



Vol. 273, No. 19, Issue of May 8, pp. 11895-11901, 1998 

Printed in U.SJL 



Cloning and Characterization of the cDNA for Human Airway 
Trypsin-like Protease* 

(Received for publication. May 6, 1997. and in revised form, March 2, 1998) 

Kazuyoshi Yamaoka:^§, Ken-ichi Masudaf, Hiroko Ogawat, Ken-ichiro Takagit* Naoji Umemotot, 
and Susumu YasuokaD 

From the XTeijin Institute for Biomedical Research, 4-3-2 Asahigaoka, Hino, Tokyo 191 and the department of Nursing, 
School of Medical Sciences, University ofTokushima, 3-18-15 Kuramotocho, Tokushima City, Tokushima 770, Japan 



Previously we isolated a trypsin-like enzyme desig- 
nated human airway trypsin-like protease from the spu- 
tum of patients with chronic airway diseases. This paper 
describes the cDNA cloning, characterization of the pri- 
mary protein structure deduced from the cDNA, and 
gene expression of this enzyme in various human tis- 
sues. We obtained an entire 1517-base pair sequence of 
cDNA with an open reading frame encoding a polypep- 
tide with 418-amino acid residues. The polypeptide con- 
sisted of a 232-residue catalytic region and a 186-residue 
noncatalytic region with a hydrophobic putative trans- 
membrane domain near the NHj terminus. The polypep- 
tide was suggested to be a type 11 integral membrane 
protein in which the COOH-terminal catalytic region is 
extracellular. Therefore, this protein is thought to be 
synthesized as a membrane-bound precursor and to ma- 
ture to a soluble and active protease by limited proteol- 
ysis. It showed 29-38% identity in the sequence of the 
catalytic region with human hepsin, enteropeptidase, 
acrosin, and mast cell tryptase. The noncatalytic region 
had little similarity to other known proteins. In North- 
ern blot analysis a transcript of 1.9 kilobases was detect- 
able most prominently in the trachea among 17 human 
tissues examined. 



Many previous investigations have indicated that proteases 
released from immunoinflammatory cells participate in patho- 
genesis of several kinds of respiratory diseases. For instance, 
neutrophil elastase has been shown to be intimately related to 
the pathologic states of pulmonary emphysema (1, 2), cystic 
fibrosis (3, 4), interstitial pneumonia (6), and adult respiratory 
distress syndrome (6) through destruction of extracellular ma- 
trix components, such as elastin, of alveolar and bronchial 
tissues. Mast cells, which abound in airway mucosa and in 
alveolar wall, release trypsin-like protease (tryptase) and chy- 
motiypsin-like protease (chymase) into extracellular spaces 
during degranulation (7). The tryptase has potential to stimu- 
late smooth muscle, fibroblast, and tissue turnover (8). Differ- 
ent substrates for chymase (9—11) indicate the potential in- 
volvement of the enzyme in a variety of processes related to the 
inflammatory response. Recently it was revealed that chymase 



* This work was supported by Grant-in-aid for Developmental Scien- 
tific Research 09670616 of the Ministry of Education, Science, Sports, 
and Culture, Japan. The costs of publication of this article were de- 
frayed in part by the payment of page charges. This article must 
therefore be hereby marked ''advertisement'* in accordance with 18 
U.S.C. Section 1734 solely to indicate this fact. 

The nucleotide sequence(s) reported in this paper has been submitted 
to the GenBank^^ / EBI Data Bank with accession number(s) 
AB002134. 

§ To whom correspondence should be addressed. Tel.: 81-425-86- 
8135; Fax: 81-425-87-5516; E-mail: ymo35291@tokcnl.teijin.co,jp. 



from human mast cells selectively converted big endothelins to 
trachea-constricting peptides (12). These effects of the two 
mast cell proteases have attracted considerable attention as 
one of the pathogenic determinants and the therapeutic targets 
of bronchial asthma and allergic inflammation. Elastase re- 
leased from alveolar macrophages has also been suggested to 
contribute to the pathogenesis of pulmonary emphysema by 
degrading matrix components of alveolar walls (13, 14). 

However, there are very few reports dealing with the func- 
tions and roles of proteases secreted from respiratory tissues, 
such as secretory glands or surface epithelial cells of the air- 
way. Kido and co-workers (15, 16) found a novel trypsin-like 
protease that is secreted from rat Clara cells, secretory cells 
localized to the distal airway only. The protease, named 
tryptase Clara, was shown to enhance the infectivity of influ- 
enza and Sendai viruses (17), although its physiological role is 
unknown. 

Previously, we found trypsin-like activity in the sputum of 
patients with chronic airway diseases and isolated a novel 
trypsin-like protease from the sputum, designated human air- 
way trypsin-like protease (HAT)^ (18). Gel filtration studies 
showed that HAT was a monomeric enzyme with an apparent 
molecular mass of 27 kDa. Immunohistochemical studies 
showed that HAT was localized mainly in cells of submucosal 
serous glands of the bronchi and trachea. These results indi- 
cate that HAT is released from the submucosal serous glands 
onto mucous membrane, at least in patients with chronic air- 
way diseases. 

In this paper, we report the cloning of HAT cDNA, the 
primary structure of this enzyme and characterization of the 
polypeptide deduced from the nucleotide sequence of the cDNA, 
and results of analysis of expression of HAT mRNA in various 
human tissues. The primary structure of HAT was compared 
with that of other known serine proteases. 

EXPERIMENTAL PROCEDURES 

Materials — Human trachea QUICK-Clone™ cDNA, human trachea 
poly(A)* RNA, human trachea AgtlO cDNA library (oligo(dT) and ran- 
dom-primed), 5' -RACE kit, human multiple tissue Northern blots, and 
human /3-actin cDNA were purchased from CLONTECH Laboratories 
Inc. (Palo Alto, CA). Tag DNA polymerase was from Promega Corp. 
(Madison, WI). SureClone^^ ligation kit, dNTP, and plasmid vector 
pUClS were from Amersham Pharmacia Biotech. Avian myeloblastosis 
virus reverse transcriptase and RNase inhibitor were from Boehringer 
Mannheim. Restriction endonucleases, random primer labeling kit, and 
Escherichia coli JM109 were from Takara Shuzo Co, Ltd. (Otsu, Japem). 
Nylon membrane Hybond^"-N-t- for blotting and [a-''^PldCTP for probe 
labeling in hybridization were from Amersham. Dcnhardt's solution 
and salmon sperm DNA were from Wako Pure Chemical Industries Ltd. 
(Osaka, Japan). Qiagen lambda kit for purification of phage DNA was 



* The abbreviations used are: HAT, human airway trypsin-like pro- 
tease; PCR, polymerase chain reaction; RACE, rapid amplification of 
cDNA ends; bp, base pairCs); kb, kilobasc or kilobase pair. 



This paper is available on line at http://www.jbc.org 



11895 



11896 



cDNA Cloning of Human Airway Trypsin-like Protease 



from Qiagen GmbH. (Hilden, Germany). Oligonucleotide purification 
cartridge column and DyeDeoxy*™ terminator cycle sequencing kit for 
sequencing of DNA were from Applied Biosystems Inc. (Foster City, 
CA). 

DNA Amplification by Polymerase Chain Reaction (PCR) — PGR was 
performed according to the procedure described by Sambrooke/ al. (19). 
Oligonucleotides used as PGR primers were synthesized by a DNA/RNA 
synthesizer (Applied Biosystems Inc., model 394) and purified by oligo- 
nucleotide purification cartridge column. Unless otherwise stated, PGR 
was carried out by adding 15 pmol of each primer and an appropriate 
amount of template DNA to 20 n\ of PCR buffer (10 mM Tris-HGl, pH 
9.0, 50 mM KGl, 1.5 mM MgGla. 1% Triton X-100) containing 0.5 units of 
Tag DNA polymerase and 0.2 mM dNTP. The reaction using a DNA 
thermal cycler (Perkin-Elmer Corp.) was carried out for 35 cycles of 
1-min denaturation at 94 *G, 1.5-min annealing at 57 ''G, and 2-min 
extension at 72 'G. 

Subcloning of DNA Fragments — To subclone DNA fragments that 
were amplified by PCR, SureClone'*'** ligation kit was used. DNA frag- 
ments were blunted by Klenow fragment, inserted into the Smal site of 
plasmid vector pUClS, and introduced into B. coli JM109 by Hanahan's 
method (20). On the other hand, for subcloning of insert DNA of AgtlO 
phage clone, the insert DNA was excised by EcoBJ from phage DNA, 
which was purified using Qiagen lambda kit and inserted into the 
£coRI site of plasmid vector pUClS. E. coli JM.109 was transformed as 
described above. Plasmid DNA was isolated from each transformant by 
the alkaline lysis procedure (21) with minor modifications. 

Analysis of DNA and Amino Acid Sequence — The nucleotide se- 
quence of the DNA inserted into plasmid vector pUClS was analyzed by 
an automated DNA sequencer (Applied Biosystems Inc., model 373) 
using the Dye-Deoxy'*^" terminator cycle sequencing kit. Both strands of 
all clones were completely sequenced. Hydropathy of amino acid se- 
quence was analyzed (22) with the Genetyx program package (Software 
Development Co. Ltd., Tokyo, Japan). A computer survey of the Na- 
tional Biomedical Research Foundation (Washington, D.G.) and 
SWISS-PLOT (European Bioinformatics Institute, Geneva, Switzer- 
land) data banks for similarity of amino acid sequences between HAT 
and other known proteins was carried out using MPsrch program, 
which was modified from the method of Smith and Waterman (23) with 
Teijin Systems Technology Ltd. (Yokohama, Japan), 

Amplification of a Partial cDNA Fragment—In a previous report 
(18), we showed that the sequence of the 20 NHs-terminal amino acids 
of native HAT purified from the sputum of patients with chronic airway 
diseases was ILGGTEAEEGSWPWQVSLRL (amino acids 187-206 in 
Fig. 1). Based on this amino acid sequence, we designed and synthe- 
sized two kinds of degenerate PCR primers; namely 5'-ATCYTNGGRG- 
GNACNGAGGC-3'"^ (sense) and 5'-ARKCKMAGGCTSACYTG-3'^ (an- 
tisense) to obtain the 59-bp cDNA fragment encoding the front 19 
residues of the NH.^-terminal amino acid sequence by PGR. PGR was 
carried out in the reaction mixture containing 5 pmol of each primer 
and 1 ng of cDNA derived from human trachea (QUICK-Clone'"^ 
cDNA), The amplified DNA fragment was then subcloned and se- 
quenced as described above. The analysis of the sequence showed that 
a 59-bp DNA fragment encoding the 19-residue amino acid sequence 
corresponding to the NHg terminus of the purified HAT was produced 
by this PCR. 

Amplification of cDNA by 3' -Rapid Amplification of cDNA Ends 
(RACE) — To obtain a cDNA that had a nucleotide sequence in the 
downstream side of the 59-bp DNA fragment, we employed the 3'-RACE 
method developed by Frohman et al. (24). Two kinds of sense primers 
were used to amplify the cDNA specifically and effectively. These prim- 
ers were designed and synthesized based on the nucleotide sequence of 
the 59-bp cDNA fragment. At first, single-stranded cDNAs were syn- 
thesized by reverse transcription at 42 *C for 60 min in 20 yA of reaction 
buffer (50 mM Tris-HGl, pH 7.6, 60 mM KGl, 10 mM MgClj, 1 niM 
dithiothreitol) containing 10 ng of human trachea poly(A)^ RNA, 115 
pmol of (dT),7-adapter primer 5'-GACTCGAGTCGACATCGA(dT)„-3', 
25 units of RNase inhibitor, 1 mM dNTP, and 40 units of avian myelo- 
blastosis virus reverse transcriptase. One-tenth of the reaction mixture 
was used as a template in the first-round PCR in which 5'-ATCTT- 
GGGGGGGAGGGAGGGTGA-3' and the adapter primer 5'-GAGTG- 
GAGTCGAGATCGAT-3' were used as the sense and antisense primers, 
respectively. For further amplification of the cDNA, the second-round 
PGR was carried out using one-fortieth of the first-round PGR reaction 
mixture as the template with 5'-GAGGCTGAGGAGGGAAGCTGGG- 



^ y represents T or C; N represents C or I (inosine); R represents G or 
A; K represents G or T; M represents A or G; S represents G or G. 



CGT-3' (nucleotides 635-659 in Fig. 1) and the (dT)i7-adapter primer 
described above as the sense and antisense primers, respectively. The 
cDNA amplified by 3'-RAGE was then subcloned and sequenced. 

Screening of cDNA Library — Plaque hybridization against human 
trachea cDNA library was performed according to the standard proce- 
dure (19). The DNA fragment obtained by 3 '-RACE was labeled by the 
random prime method (25) using [a-'^^PldCITP and random primer la- 
beling kit. Using this probe, 1 x 10° plaques derived from human 
trachea AgtlO cDNA library were screened by hybridization as follows. 
The blots for the plaques were hybridized with the probe at 65 *C 
overnight (16-20 h) in a solution containing 5x SSPE bufTer (0.75 M 
NaCl, 50 mM NaHaPO^. 5 mM EDTA, pH 7.4), 5X Denhardfs solution, 
0.1% SDS, and 100 /ig/ml denatured salmon sperm DNA. These blots 
were then washed twice at 65 "C for 20 min with O.lx SSPE bufTer 
containing 0.1% SDS. F^ve positive clones were selected and plaque- 
purified, and the insert DNAs of these clones were then subcloned and 
sequenced. 

Amplification of cDNA by 5' -RACE — To obtain a cDNA that had a 
nucleotide sequence in the upstream side of the cDNA coding for native 
HAT, amplification of the cDNA was carried out using 5'-RAGE kit (24). 
Single-stranded cDNAs were synthesized by reverse transcription of 2 
fig of human trachea poly(A) * RNA using the antisense primer 5'- 
AGGTGGCAATGCAGTGACGAGGATT-3' (nucleotides 785-761 in Fig. 
1). The single-stranded cDNAs were purified using glass powder in 
5'-RAGE kit after alkaline hydrolysis of RNA in the reaction mixture. 
Using T4 RNA liga.se, AmpliFINDER'^" anchor was ligated to the 
3'-ends of the single-stranded cDNAs. PGR amplification (0.75 min at 
94 °G, 0.75 min at 57 *G, and 2 min at 72 "O was then carried out using 
0.01 of the ligation mixture as template, with anchor primer 5'-CTG- 
GTTGGGGGCAGCTCTGAA<^TTCCAGAATCGATAG-3' and 5'-TGA- 
GCTGGTGTGAGGATGCACATGT-3' (nucleotides 741-717 in Fig. 1) as 
the sense and antisense primers, respectively. The cDNA amplified by 
5 '-RAGE was then subcloned and sequenced. 

Expression and Purification of Recombinant HAT — A 1.3-kb BamHI- 
Hindlll fragment containing the entire HAT cDNA was cloned into the 
transfer vector pBlueBacIII (Invitrogen, San Diego, CA) to generate 
pBacPHATl. Recombinant HAT-expressing viruses were generated af- 
ter co-transfection of Sf9 cells with pBacPHATl and wild-type AcMNPV 
DNA essentially as described by the manufacturer (Invitrogen). For 
baculovirus/insect cell expression (26), 800 ml of Tn5 (27) cells were 
then infected with the high titer lysate for 72 h and harvested by 
centrifugation. The cell pellet was treated with 1% Triton X-100 for 1 h 
on ice and was centrifuged at 100,000 x g for 1 h at 4 *G. From this 
infected cell lysate, the recombinant HAT was isolated by sequential 
chromatographic procedures of the native HAT purification described 
previously (18). SDS-polyacrylamide gel electrophoresis, immunoblot- 
ting, and degradation of fibrinogen by HAT were done as described (18) 

Northern Blot Analysis — The expression level of HAT mRNA in var- 
ious human tissues was examined by Northern blot analysis. To pre- 
pare the probe for the analysis, the full-length cDNA for HAT was 
^^P-labeled by random priming (25) and hybridized as follows. Northern 
blots of various human tissues, which contained 2 pig of poly(A)^ RNA 
derived from various tissues in each lane, were probed under the same 
conditions as the library screening described above (except that the 
concentration of SDS was 0.5%) and then washed. In the case of the blot 
for trachea, 2 ftg of human trachea poly(A)^ RNA was resolved by 1% 
agarose-formaldehyde gel electrophoresis (28), and transferred onto 
Hybond''"^'-N+ blotting membrane and UV-cross-linked. X-ray films 
were exposed to the probed blots for 4 days at —80 "C with an intensi- 
fying screen, and the presence of HAT mRNA in each human tissue was 
evaluated. These blots were then stripped of the HAT cDNA probe by 
boiling in 0.5% SDS for 10 min and re-probed with ^^P-labeled human 
^-actin control probe as an internal standard for the amounts of RNA 
loaded. 

RESULTS AND DISCUSSION 

Cloning of HAT cDNA — Using a pair of highly degenerate 
oligonucleotide primers, the partial 59-bp cDNA fragment for 
HAT, which contained a nucleotide sequence coding for the 
NHg-terminal 19-residue amino acid sequence of the native 
HAT, was obtained by PCR amplification from human trachea 
cDNA. To stretch this cDNA sequence to the 3'-end, a 3'-RACE 
reaction was carried out. The resulting 0.9-kb amplified prod- 
uct was shown to encompass the entire nucleotide sequence of 
the 3' region, including the poly(A) tail of HAT cDNA (nucleo- 
tides 635-1517 in Fig. 1). The amino acid sequence deduced 



cDNA Cloning of Human Airway Trypsin-like Protease 



11897 



GACTOXi^ATCTCAAAGCACrmylCr^AOOCAGAAAAAAGAACV 60 

AATgTATACgCCACCACXrroTAACTTOGACTTCW^ 12 0 

1 MYRPARVTSTS R I F I. N P Y V 

21 I P I V V A n V V T T. A V T I A I. T. V Y P 1 

immmilCATCAAAAATCTrACTTTTATA 240 
41 I L A Pi DQKSYFYRSSFQLLNVE 

ATATAATAG?CACTTAAATTCACCACCTACACAGGAATACAGGACTTTC»^ 300 
61 YNSQLNSPATQEYRTLSCRl 

TGAATXriXrKyVTTACTAAAACATTCAAAGAATCAAATTT^ 360 
61 ESLITKTFKB5NLRNQFIRA 

101 HVAKLRQDGSGVRADVVMKP 

TCAATTCACTAGAAATAACAATGGACCATCAATGAAAACXAGAATTCAGTOT 480 
121 QFTRNNNCASMKSRIESVLR 

ACAAATCCrrcAATAACTCTKAAACCTCXSAAAlW^CCCTT^AA^ SdO 
141 QWLNNSGNLEINPSTBITSL 

TACroACCACGCTGCACCyu^ATO3GCTTATTAATGAATCyreO0(^^ 600 
161 TDQAAANWLINECGAGPDLI 

AACATTGTCTGAGCAGAGAATCCTTCCAGGCACTCW3GCTGAGGAGGGAAG<^^ 660 
181 T L S fi Q R ILGG TEAB R G S W P W 

GCAAGTCACTCTCKXXMrrcAATAATCXXCACCACTGTOGAGCC^^ 720 
201 Q V S L R L HNAHHCGGSL.INNM 

CTGGATCCTCACACCAGCTCACTGCTTCAGAAOCAACTCTAAT^^ 780 
221 WILTAAHCFRSNSNPRDWIA 

CACXnxriGGTATTTXXrACAACATTrcCTAAACTAAGAATGAGAGTAAGAAATAT^^ 840 
241 TSGISTTFPKLRMRVRNILI 

TCATAACAATrATAAATCrrGCAACTCAT3AAAATClACATTGCJW:ri^^ 900 
261 HNNYKSATHENDIALVRLBN 

CAGTCTCACCTTTArcAAAGATATCCAmrKnOTCriC^ 960 
281 SVTFTKDIHSVCLPAATQNI 

TC»CClGGCTCTACTGCrTATGTAACAGGATGGGGCGCTCAAG^ 1020 
301 ppGSTAYVTCWGAQEYAGHT 

AGTTXX:AGACXrrAAGGCAAGGACACWTCAGAATAATAAGTAATGATCTATG'?AATGW 1080 
321 VPBLRQGQVRIISNDVCNAP 

ACATAGTTATAATCWAGCCATCTTCTrCTGGAATGCTGTGTGCTGGAGTAC^^ 1140 
341 HSYNGAILSGMLCAGVPQGG 

AGTXXJACGC ATXrrc AGGGTGACTCTGGTT3GCCCACTAGTAC AAGAAG AC^ 1200 
361 VDACQGDSGGPLVQEDSRRL 

TTCGTTTA'ITCTCGGG ATACT AAGCTGGCGAGATCAGTCTGGCCTOCCGGATAA 1260 
381 WFIVGIVSWGDQCGLPDKPG 

AGTGTATACTCGAGTG ACAGCCrACCTTCACTGGATTAGGC AACAAACTGGGATC^ 1320 
401 VYTRVTAYLDWIRQQTGI • 

CAACAACrroCATCXX7n7ITXXyUU«riCTGT^ 1380 

CTrTACATTTCAACTCAAAAAGAAACrrAGAAATGTCCTAATTTAACATC^^ 1440 

ATA UXX r i ' IU 'AACaAACACTCTTrAACCTITCTTTArrATTAAAGGl^^ ISOO 

AAAAAAAAAAAAAAAAA 1^1'^ 

Fig. 1. Nucleotide sequence of HAT cDNA and its deduced 
amino acid sequence. The nucleotide sequence of the HAT cDNA is 
shown along with the deduced amino acid sequence beginning with the 
first ATG codon. A stop codon (TAG) at the terminus of the translation 
sequence is marked with an asterisk. Nucleotides are numbered at the 
right margin and amino acids on the left. The NHg-terminal sequence 
obtained from the purified enzyme is underlined. The boxed amino acid 
sequence represents a potential transmembrane domain. 

from this 0.9-kb fragment was shown to exactly contain the 15- 
amino acid sequence (amino acids 192-206 in Fig. 1) of the 
NHg-terminal 20-amino acid sequence of the native HAT. With 
this 0.9-kb cDNA fragment as a probe, 1 X 10® clones of a 
human trachea AgtlO cDNA Ubrary were screened. Five of 28 
independent positive clones were then subcloned and se- 
quenced. The largest insert was shown to contain a 1323-bp 
sequence of cDNA (nucleotides 133-1455 in Fig. 1) but was 
considered not to contain the entire nucleotide sequence of the 
5' region of HAT cDNA. To obtain the missing sequences in the 
5' region of HAT cDNA, 5 '-RACE reaction was carried out. The 
5 '-RACE procedure produced a 741-bp cDNA fragment (nucle- 
otides 1-741 in Fig. 1). This product had a 609-bp nucleotide 
sequence overlapping (nucleotides 133-741 in Fig. 1) with the 
5 '-end of the largest insert of cDNA clone obtained by the cDNA 
library screening. 

Sequence and Structural Features of HAT cDNA — Analysis 
of the cDNA clones obtained by the successive procedures in- 





12 3 4 




1 03— • 
77.0 — 




48, 0-* 




34.2 — 




28.4 — 




20.5 — 


if' 



Fig. 2. Immunoblotting of the native HAT and the recombi- 
nant HAT, Specific binding was analyzed using the antibody against a 
peptide corresponding to the NHa-terminal 19 amino acids of HAT as 
described previously (18). Lane 7, standard proteins; lane 2, purified 
native HAT (0.10 tig); lane 3, lysate of infected Tn5 cells derived from a 
20-^1 culture; and lane 4, purified recombinant HAT (0.10 ^g). 

eluding 3'-RACE, cDNA Hbrary screening, and 5'-RACE 
showed a 1517-bp nucleotide sequence up to the poly(A) region 
(Fig. 1), which represented the HAT cDNA sequence. This 
nucleotide sequence was also shown to contain one open read- 
ing frame, and the polypeptide deduced from the cDNA in- 
cluded the 20-residue amino acid sequence of the NH2 terminus 
of the native HAT (amino acids 187-206 in Fig. 1). The molec- 
ular mass of the polypeptide, including the NHg terminus of the 
20 residues to the COOH terminus deduced from the stop codon 
TAG (nucleotide- 13 16), was estimated to be 25,308 Da. This 
value is similar to the apparent molecular mass (27 kDa) esti- 
mated by gel filtration of the native HAT protein purified from 
sputum (18). 

In the 5 '-flanking region of this cDNA, one in-frame stop 
codon TAG was located at nucleotide 26. Four in-frame ATG 
codons were detectable between this stop codon and the region 
encoding the native HAT, but none of these ATG codons satis- 
fied the criteria for a Kozak consensus sequence (29). Therefore 
we could not determine the translational initiation site in the 
cDNA from the nucleotide sequence. To determine the initia- 
tion site, we expressed recombinant HAT in a baculovirus/ 
insect cell system using the HAT cDNA. The recombinant virus 
containing the HAT cDNA was isolated, and the insect cell Tn5 
was infected with the virus and then cultured. The lysate 
obtained by 1% Triton X-100 treatment of the infected cells was 
analyzed by immunoblotting with a rabbit antibody against a 
peptide corresponding to the NHg-terminal 19-amino acid se- 
quence of the native HAT (18) as primary antibody, and the 
immunoblotting indicated that the infected cells biosynthe- 
sized a protein with a molecular mass of 48 kDa as a main 
product (Fig. 2). The molecular mass of each polypeptide, de- 
duced from the nucleotide sequence initiating from each of 4 
ATG codons in the cDNA, was 46,263, 32,933, 31,436, and 
30,107 Da, respectively. The molecular mass of 46,263 Da is the 
most similar to that of the recombinant protein expressed in 
the insect cells, suggesting that the ATG located nearest the 
5'-end (at nucleotide 62) is the initiation codon of HAT. 

To demonstrate that the cloned enzyme has the same activity 
as the native HAT, the recombinant HAT that was expressed in 
the baculovirus/insect cell system was isolated in its active 
form. The minor product in Fig. 2, lane 3 was isolated selec- 
tively as the active recombinant HAT from the infected cell 
lysate by sequential chromatographic procedures of the native 



11898 



cDNA Cloning of Human Airway Trypsin-like Protease 



HAT purification (18). The purified recombinant enzyme has 
the molecular mass of 28 kDa on SDS-polyacrylamide gel elec- 
trophoresis and the identical 10 NHa-terminal residues to the 
native HAT. Immunoblotting also showed the purified recom- 
binant enzyme as same size as the native HAT (Fig. 2). The 
recombinant HAT had an enzymatic activity degrading fibrin- 
ogen, especially the a-chain (Fig. 3), similar to the native HAT. 
From these results, it was established that the isolated cDNA 
clone encodes HAT. 

Based on these results, the nucleotide sequence of the cDNA 
for HAT (Fig. 1) was summarized as follows. The cDNA in- 
cludes 1254 nucleotides coding for 418 amino acids and two 
untranslated nucleotide sequences composed of 61 and 185 
nucleotides at the 5'-end and 3 '-end, respectively. In the 3'- 
untranslated region, there is a polyadenylation signal se- 
quence, ATTAAA, at nucleotides 1478-1483, 17 nucleotides 
distant from the poly(A) tail. 

Analysis of Deduced Amino Acid Sequence of HAT — The open 
reading frame of HAT cDNA was thought to encode a polypep- 
tide consisting of 418 amino acid residues, thus having the 
molecular mass of 46,263 Da. The NHg-terminal 20-amino acid 



12 3 4 



(kDa) 




45.0 — 



31.0 — 



21.5— — 
14. 4-* — 



or - c h a i n 
>9-chain 
y -chain 



Fig. 3. Degradation of human fibrinogen by the native HAT 
and the recombinant HAT. Hydrolyzing reaction and SDS-polyacryl- 
amide gel electrophoresis were done as described previously (18). For 
each reaction, 0.10 /xg of HAT was used. Lane 1, standard proteins; lane 
2, fibrinogen (blank control); lane 5, fibrinogen hydrolyzed by native 
HAT; lane 4, fibrinogen hydrolyzed by recombinant HAT. 



Hydrophobic 
3.0 



Fig. 4. Hydropathy plot of the de- 
duced amino acid sequence of HAT. 
The method of Kyte and Doolittle (22) was 
used with averaging over a window of 10 
residues. Hydrophobic residues show pos- 
itive values, whereas hydrophilic residues 
show negative values. Amino acid num- 
bering begins with the start codon Met. 



0.0 



-3.0 
Hydrophilic 



sequence of the native HAT extends from Ile^^'^ to Leu^°® in the 
sequence of the deduced polypeptide (Fig. 1). This result indi- 
cates that the Arg^^^^-Ile^^*^ peptide bond in the HAT polypep- 
tide should be cleaved for activation of HAT. This type of 
cleavage has been shown to be a relatively common step for 
activation of many known serine protease zymogens (30, 31). 
Therefore it is likely that the HAT gene product is synthesized 
as a precursor protein that consists of a noncatalytic region 
with 186 amino acid residues (20,955 Da, amino acids 1-186 in 
Fig. 1) and a catalytic region with 232 amino acid residues 
(25,308 Da, amino acids 187-418 in Fig. 1) and that the pre- 
cursor is converted to an active enzyme by limited proteolysis 
like trypsinogen to trypsin in the small intestine (32). In this 
noncatalytic region, there were two potential iV-linked glycosy- 
lation sites, namely Asn-Asn-Ser and Asn-Pro-Ser, at Asn^'*'* 
and Asn^^^, respectively. 

A hydropathy plot (22) of the predicted amino acid sequence 
of HAT precursor (Fig. 4) showed that a typical NHg-terminal 
signal sequence (33—35) is not present, but a single obvious 
hydrophobic region (amino acids 13-43 in Fig. 1) is present 
near the NHg terminus. This hydrophobic region consisting of 
31 amino acid residues does not contain any charged amino 
acids and is flanked by charged amino acids (Arg*^ and Asp'*'*). 
This internal hydrophobic region is thought to correspond to a 
transmembrane domain that anchors the protein to the cell 
membrane (36), A generalized rule in the eucaryotic transmem- 
brane proteins (37, 38) suggests that the difference in total 
charge between 15-residue sequences on either side of the 
membrane-spanning hydrophobic region determines the orien- 
tation of the protein, with the more positive side facing the 
cytosol. As for the precursor polypeptide deduced from HAT 
cDNA, the NHg-terminal side of the hydrophobic region had a 
net charge of + 3, whereas the opposite side had that of + 1. The 
charge on the NH2-terminal side was +2, as positive as that on 
the COOH-terminal side. This result suggests that HAT pre- 
cursor has an intracellular NH^-terminal tail region consisting 
of 12 amino acid residues facing the cytosol and an extracellu- 
lar COOH-terminal region consisting of 375 amino acid resi- 
dues and containing the catalytic region. Therefore, the HAT 
precursor can be classified as a type H integral membrane 
protein (39, 40) and is thought to be synthesized as a mem- 
brane-bound precursor protein translocated to the cell surface, 
processed to a soluble form, and released. 

Because neither the precursor nor intermediate form of HAT 



a potential transmembrane domain 




451 



Residue Number 



cDNA Cloning of Human Airway Trypsin-like Protease 



11899 



Fig. 5. Comparisons of the deduced 
amino acid sequence of the catalytic 
portion of HAT with those of other 
serine proteases. Identical amino acid 
residues are shaded, and the catalytic 
triad of histidine, aspartic acid, and ser- 
ine are indicated by triangles. Hyphens 
represent gaps to bring the sequences to 
better alignment. 



HAT 

Hepsin 

Enteropeptidase 
Ac rosin 
Tryptase 

HAT 

Hepsin 

Enteropeptidose 

Acrosin 

Tryptase 

HAT 
Hepsin 

Enteropeptidase 

Acrosin 

Tryptase 

HAT 
Hepsin 

Enteropeptidase 

Acrosin 

Tryptase 

HAT 
Hepsin 

Enteropeptidase 

Acrosin 

Tryptase 

Acrosin 



187:IL66TEAEEGsSinp(?«Sli RLN^m^q^GPI^iNl^LTS^FRSNSNP-RDW-I 

: rvCGRDTSLGRWgWQ^^ij RYDGAHLC§($i!L SGDWVLTA^tEFPERNRVLSRWRV 

: S NrtiiKNAKEC>#l^W(Hl- -YYGGR- - L- 

: MSKAAQHGA,W©'i?^QIFRYNSHRYHTC'^^ FVOCNNVHDVWL 
: IvCSQEAPRStOOTQV^- -R\mORYW>«FCGGS^IHPWLTWyCL -GPOV- -KD- - L 



240 : ATSGISTTFPK-LRMRVRNILIHNNY K-SATHE- -NCMALVRL ENSVTFTKOIHSV 

FAGAVAQASPHGLQLGVQAVVYHGCYLPF-ROPNSEENSNOljatLVHLSSPlPLTEYIQPV 
AILGLHMKSNLTSPQTVPRLIDEIVINP---HYNRRRKDNDIAMMHLEFKVNYTOYIQPI 
VFCAKEITYCNNKPNflaPLQERYVEKIIIHEKYNSATEGNDlXLVEITPPISCGRFIGPG 
ATLRVI^GTHLYYQDQLLP-VSRI^M^P- - -QFYIIQTGADlia 

292 : Ct'PAATQNIPP(i-STAYVTj^G-AQE YAG-KTVPBER(?G(WIISNDVC- -N- APHS^,- - 
:a;PAAGQALVD§-KICTVTGWG-NTTJYYG-(WAG\flifQEAR^ 
:<tFEEI^VFPP^RNCSIAGWGTWYQGT-TANI-)|QEADSPL 
: a'WFKAGL PRGSQSCWA^SViffiYI E EKAP-RPSSIBIEARyDL IDLDL CNS - 
: MtfePASETFPP'C-MPCWNA'GiSDVDNDE PLPPPFPt'KQVKVPIM 



344 



- NCAI - LSGML^AGVPQGGVbAC<3iGl>SGG^^^^ 
-GNQI-KPKWfflL(^rPEGGIDAe^bi^^^ 

-N— IT- ErWIGAI^YE EGGlbsc(QG656^%MC — QEN— NRWFLA'GVT^FCYKGALPNR 
- — R VQPTNV-GAGYPVGKIffrgoiSDSGtJ^ — KDSKE SAYVW^IT|W6VGGALAKR 
OOVRI IRDDMLgAjJ- - NSQRgSQ^gSGGlL VC- - KVM— GTWlQASiW^WDE AQPNR 

399 : ^"GV3^TRVTAYL0P-RQQTGI - 

P6VYTKVSDFREWI- FQAIKTHSEASGMVTQL 

|?G\^ARVSRFTEWl-QSFLH - 

PGimTWPYLNWiASKIGSNALRMIQSATPPPPTTRPPPIRPPFSHPISAHLPWYFQPP 
IglYTRVTYYLOll-KHYVPKKP 

:PRPLPPRPPAAQPPPPPSPPPPPPPPASPLPPPPPPPPPTPSSTTKLPQGLSFAKRLQQL 



Acrosin 



: lEVLKGKTYSDCKNHYDMETTELPELTSTS 




Fig. 6. Northern blot analysis of 
HAT mRNA in various human tis- 
sues. The blots were hybridized to HAT 
cDNA probe {upper panel). The same fil- 
ters were re-hybridized with j3-actin 
probe as an internal standard for the 
amounts of RNA loaded {lower panel). 



has been isolated and characterized, it is unknown whether or 
not the membrane-bound HAT is active on the cell surface. The 
mechanisms of expression and activation of many serine pro- 
teases have been clarified. The predicted maturation process of 
HAT precursor described above is similar to that of the Bacillus 
amyloliquefaciens subtilisin (41). The subtilisin is synthesized 
as a membrane-associated precursor (preprosubtilisin) and re- 
leased outside the cell after it is autocatalytically converted to 
an active form (42). Only mature subtilisin has been detected 
extracellularly (41). Active HAT contained in sputum samples 
was also detected extracellularly. 

It is possible that the membrane-bound HAT or the portion 



remaining in the membrane afler release of the soluble HAT 
may be involved in some important physiological processes on 
the cell surface through interaction with ligands, other pro- 
teins, or the surface. Recent reports have shown that some 
viruses and a bacterial toxin utilize cell surface proteases as 
receptors (43—47), indicating other usage in addition to intrin- 
sic roles of these proteins. 

Homology of Amino Acid Sequence of HAT with Other Pro- 
teases — To find any similarity in the primary structure be- 
tween HAT and known proteins, we surveyed publicly avail- 
able data banks. Previous investigators have shown that the 
serine protease family has a common catalytic site consisting of 



11900 



cDNA Cloning of Human Airway Trypsin-like Protease 



three amino acid residues. His, Asp, and Ser, joined by hydro- 
gen bonds to display catalytic action as a catalytic triad, al- 
though they are located apart from each other in the primary 
structure of the enzyme (48). Based on these estabUshed facts, 
the catalytic site of HAT is thought to consist of amino acid 
residues His^^'', Asp^"^^, and Ser^®® (Fig. 5). In comparison of 
the amino acid sequence of HAT with those of other serine 
proteases, the most striking similarity was found around this 
putative catalytic triad as shown in Fig. 5. Six of seven cysteine 
residues in the catalytic region of HAT were at identical posi- 
tions as those of other serine proteases (Fig. 5). Nine cysteine 
residues were contained in the deduced polypeptide of HAT 
precursor, and the Cys^° was located in the predicted trans- 
membrane domain. Based on the locations of the known disul- 
fide bridges in other serine proteases (49), it is postulated that 
the other eight cysteine residues may form four disulfide bonds, 
which are located at cysteine pairs 212/228, 337/353, and 364/ 
393 in the catalytic region and at 173/292 between the non- 
catalytic region and the catalytic region. 

It was shown that the amino acid sequence of the catalytic 
region of HAT was homologous to that of the other human 
serine proteases: 38% identity with hepsin (50), 32% with en- 
teropeptidase (51), 30% with acrosin (52), and 29% with mast 
cell tryptase (53). Hepsin, of which the catalytic region shows 
the highest similarity with that of HAT in this survey, is a cell 
surface protease widely expressed in various tissues including 
liver and is suggested to play a role in cell growth and main- 
tenance of cell morphology (54). 

On the other hand, the amino acid sequence of the noncata- 
lytic region of HAT showed no significant similarity with those 
of other proteins and had neither kringle nor an EGF-like 
domain, which are found in soine kinds of proteases relating to 
the blood coagulation, fibrinolysis, and complement cascades 
(55). The function or roles of this unique and relatively long 
noncatalytic portion of HAT precursor are unknown. 

Northern Blot Analysis — Previously, we showed immunohis- 
tochemically that HAT protein was expressed in the cells of 
submucosal serous glands of human bronchi and trachea (18). 
Serous glands are widely distributed in various human tissues. 
Therefore multiple tissue Northern blot analysis was carried 
out to confirm that HAT mRNA was expressed in the human 
lower airway and also to clarify whether or not HAT mRNA 
was expressed in human other tissues. As shown in Fig. 6, a 
1.9-kb transcript was detectable in only the trachea blot among 
the 17 different types of tissues examined, such as heart, brain, 
pancreas, lung, and liver. The mRNA size is in fairly good 
accordance with that (1517 bp) of the HAT cDNA established in 
the present work. In addition to the 1.9-kb mRNA, 3.0-kb and 
0.9-kb signals were weakly detectable in the trachea and pan- 
creas blot, respectively. These two transcripts may appear as 
result of an alternative splicing/polyadenylation process or rep- 
resent a cross-hybridizing mRNA, but the nature of these tran- 
scripts is unknown. These results strongly suggest that HAT 
mRNA is more actively expressed in the lower airway including 
trachea than in the other tissues examined and support our 
previous result that HAT is localized in cells of submucosal 
serous glands of trachea and bronchi. 

Although the native HAT was found in the sputum of pa- 
tients with chronic airway diseases, HAT mRNA is thought to 
be expressed in the normal tissues of healthy subjects, because 
the trachea poly(A)'^ RNA subjected to the Northern blot was 
obtained from the normal trachea tissues of three white male 
subjects who died of trauma or acute heart failure. It will be 
useful to compare expression levels of mRNA and protein of 
HAT in the patients with airway diseases with those in healthy 
subjects to clarify the physiological and pathophysiological sig- 



nificance of HAT in the airway. In the airway, various kinds of 
proteins such as lysozyme (56), secretory IgA (57), and secre- 
tory leukocyte protease inhibitor (58) are secreted from the 
submucosal serous glands onto mucous membrane and become 
constituents of airway mucous or bronchial secretions (59). 
These proteins play impori-ant roles in the host defense system 
of airways together with respiratory mucous glycoproteins, 
which are secreted from mucous glands cells and goblet cells 
(59). HAT may be released from the serous glands with these 
proteins and play some biological role in the host defense sys- 
tem on the mucous membrane independently of or in coopera- 
tion with other substances in airway mucous or bronchial 
secretions. 

In summary, it was confirmed through the present work that 
HAT is a novel trypsin-like serine protease by analyzing the 
primary structure of the polypeptide deduced from the nucleo- 
tide sequence of its cDNA. However, the mechanism of activa- 
tion of the HAT precursor to mature enzjrme, the physiological 
role of the enzyme, and biological significance of the noncata- 
lytic region of the precursor remain to be resolved. 

REFERENCES 

1. Cohen, A. B. (1983) Am. Rev. Respir, Dis. 127, (suppl.) 2-58 

2. Janoff, A. (1985) Am. Beu. Respir. Dis. 132, 417-433 

3. Suter, S., and Chevallier, I. (1991) Eur. Respir. J. 4, 40-49 

4. O'Conner, C. M., GafFney, K., Keane, J., Southey, A., Byrne, N., O'Mahoney, S., 

and Fitzgerald, M. X. (1993) Am, Rev. Respir. Dis. 148. 1665-1670 

5. Meyer, K. C, Lewandoski, J. R., Zimmerman, J. J., Nunley, D., Calhoun, W. J., 

and Dopico, G. A. (1991) Am. Rev, Respir. Dis. 144, 580-585 

6. Lee, C. T., Fein, A. M., Lippniann, M,, Holtzman, H., Kimbel, P., and 

Weinbaum, G. (1981) ;V. Engl. J. Med. 304. 192-196 

7. Schwartz, L. B., Irani, A. M., Roller, K,, Castells, M. C, and Schechter, N. M. 

(1987) t/. Immunol. 138. 2611-2615 

8. Caughey, G. H. (1994) Am. J. Respir. Crit. Care Med. 150, (suppl.) 138-142 

9. Gervasoni, J. E., Jr., Conrad, D. H., Hugli, T. E., Schwartz, L, B,, and Ruddy, 

S. (1986) J. Immunol. 136, 285-292 

10. Pejler, G., and Karlstrom, A. (1993) J. Biol. Chem. 268, 11817-11822 

11. Saarinen, J.. Kalkkinen, N., Welgus, H. G., and Kovanen, P. T. (1994) J. Biol. 

Chem, 269, 18134-18140 

12. Nakano, A., Kishi, F., Minami, K., Wakabayashi, H., Nakaya, Y., and Kido, H, 

(1997) J. Immunol. 159, 1987-1992 

13. Janoff, A. (1983) J. Appl. Physiol. 55. 285-293 

14. Sibille, Y., and Reynolds, H. Y. (1990) Am. Rev. Respir. Dis. 141, 471-501 

15. Kido, H., Yokogoshi, Y., Sakai. K., Tashiro, M., Kiahino, Y., Fukutomi, A., and 

Katunuma, N. (1992) J. Biol. Chem. 267, 13573-13579 

16. Sakai, K., Kawaguchi, Y., Kishino, Y., and Kido, H. (1993) J. Histochem. 

Cytochem. 41, 89-93 

17. Taahiro, M., Yokogoshi, Y., Tobita, K., Seto, J. T., Rott, R., and Kido, H. (1992) 

J. Virol. 66, 7211-7216 

18. Yasuoka, S., Ohnishi, T., Kawano, S., Tsuchihashi, S., Ogawara, M., Masuda, 

K., Yamaoka, K., Takahaahi, M., and Sano, T. (1997) Am. J. Respir, Cell 
Mol. Biol. 16, 300-308 

19. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A 

Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring 
Harbor, NY 

20. Hanahan, D. (1983) J. MoL Biol. 166, 557-580 

21. Bimboim, H. C, and Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523 

22. Kyte, J., and Doolittle. H. F. (1982) J. Mol. Biol. 157. 105-132 

23. Smith, T. F., and Waterman, M. S. (1981) t/. Mol. Biol. 147, 195-197 

24. Frohman, M. A., Dush, M. K., and Martin. G. R. (1988) Proc. Natl. Acad. Sci. 

U. S. A. 85, 8998-9002 

25. Feinberg. A. P., and Vogelstein. B. (1983) Ana/. Biochem. 132, 6-13 

26. Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Scidman, J. G., Smith, 

J. A., and Struhl, K. (1997) Current Protocols in Molecular Biology, Suppl. 
38, 16.9.1-16.11.12, John Wiley & Sons, Inc., New York 

27. Wickham, T. J., Davis, T., Granados, R. R., Shuler, M. L., and Wood, H. A. 

(1992) Biotechnol. Prog. 8. 391-396 

28. Lehrack, H., Diamond, D., Wozney, J. M., and Boedtker, H. (1977) Biochem- 

istry 16, 4743-4751 

29. Kozak, M. (1987) Nucleic Acids Res. 15. 8125-8148 

30. Blow, D. M, (1971) in The Etizytnes (Boyer, P. D., ed) 3rd Ed., Vol. 3, pp. 

185-212, Academic Press, Inc., New York 

31. Keil. B. (1971) in The Enzymes (Boyer, P. D., ed) 3rd Ed., Vol. 3. pp. 250-275, 

Academic Press, Inc., New York 

32. Davie, E. W., and Neurath. H. (1955) J. Biol. Chem. 212, 515-529 

33. von Heijne, G. (1983) Eur, J. Biochem. 133, 17-21 

34. Wickner, W. (19801 Science 210, 861-868 

35. von Heijne, G. (1985) J. Mol. Biol. 184, 99-105 

36. von Heijne, G.. and Manoil, C. (1990) Protein Eng. 4, 109-112 

37. Hartmann, E., Rapoport, T. A., and Ix)di.sh, H. F. ( 1989) Proc. Natl. Acad. Sci, 

U. S. A. 86, 5786-5790 

38. Parks, G. D., and Lamb, R. A. (1991) Cell 64, 777-787 

39. High, S. (1992) Bioessays 14, 535-540 

40. Semenza, G. (1986) Annu. Reu. Cell Biol. 2, 255-313 

41. Wells, J. A., Ferrari, E,, Henner, D. J., Estell, D. A., and Chen, E. Y. (1983) 



cDNA Cloning of Human Airway Trypsin-like Protease 



11901 



Nucleic Acids Res. 11, 7911-7926 

42. Power. S. D., Adams, R. M„ and Wells, J. A. (19S6) Proc, Natl. Acad. Set. 

U. S, A. 83, 3096-3100 

43. Kido. H., Fukutomi, A., and Katunuma, N. (1991) Biotned. Biochim. Acta 50, 

781-789 

44. Delmas, B., Gelfi. J., L'Haridon, R., Vogel, L. K., Sjostrom, H., Noten. O.. and 

Laude, H. (1992) Nature 357. 417-420 

45. Yeager, C. L., Ashmum, R. A,, Williams, R. K., Cardellichio, C. B., Shapiro, 

L. H„ Look, A. T., and Holmes, K. V. (1992) Nature 357, 420-422 

46. Sdderberg, C, Giugni, T. D., Zaia, J. A., Larsson, S., Weihlberg, J. M., and 

Mbller, E. (1993) J. Virol. 67, 6576-6585 

47. Knight, P. J. K., Crickmore, N., and Ellar, D. J. (1994) Mol. Microbiol. 11, 

429-436 

48. Kraut, J. (1977) Annu. Rev. Biochem. 46, 331-358 

49. Young, C. L.. Barker, W. C, TomaselU, C. M., and DayhofT, M. O. (1978) Af/o« 

of Protein Sequence and Structure (DayhofT, M. O., ed) Vol. 5, pp. 73-93, 
National Biochemical Research Foundation, Washington, D. C. 

50. Leytus, S. P.. Loeb, K. R., Hagen, F. S., Kurachi, K., and Davie, E. W. (1988) 



Biochemistry 27, 1067-1074 

51. Kitamoto, Y., Veile, R. A,, Donis-Keller, H., and Sadler, J. E. (1995) Biochem- 

istry 34, 4562-4568 

52. Adham, 1. M., Klemm, U., Maier, W. M., and Engel, W. (1990) Hum. Genet. 84, 

125-128 

53. Miller, J. S., Westin, E. H., and Schwartz, L. B. (1989) J. Clin. Invest. 84, 

1188-1195 

54. Kurachi, K-, Torres- Rosado, A., and Tsuji, A. (1994) Methods Enzymol. 244, 

100-114 

55. Patthy, L. (1990) Blood Coagul. Fibrinolysis 1, 153-166 

56. Bowes, D., and Corrin, B. (1977) Thorax 32, 163-170 

57. Goodman. M. R.. Link, D. W., Brown, W. R., and Nakane, P. K. (1981) Am. Rev. 

Respir. Dis. 123. 115-119 

58. do Water, R., Willems, L. A., van Muijen, G. N., Frankcn, C., Franscn, J. A., 

Dijkman, J. H., and Kramps, J. A. (1986) Am. Rev. Respir. Dis. 133, 
882-890 

59. Kaliner, M., Shelhamer, J. H., Borson, B., Nadel, J., Patow, C., and Marom, Z. 

(1986) A/71. Rev. Respir. Dis. 134, 612-621 



Exhibit 44 



The Joi>-RNAi- OF Bioj/x'.irAL Chemistry 

€> 1999 by The American Society for Biochemislry and Molecular Biology, Inc. 



Vol. 274, No. 21, Issue of May 21. pp. 14926-14935. 1999 

Printed in U.SA. 



Corin, a Mosaic Transmembrane Serine Protease Encoded by a 
Novel cDNA from Human Heart* 

(Received for publication, January 11, 1999) 
Wei Yan, Ning Sheng, Marian Seto, John Morser, and Qingyu Wui 

From the Departments of Cardiovascular Research and Biophysics, Berlex Biosciences, Richmond, California 94804 



A novel cDNA has been identified ^om human heart 
that encodes an unusual mosaic serine protease, desig- 
nated corin. Corin has a predicted structure of a type II 
transmembrane protein and contains two frizzled-like 
cysteine-rich motifs, seven low density lipoprotein re- 
ceptor repeats, a macrophage scavenger receptor-like 
domain, and a trypsin-like protease domain in the extra- 
cellular region. Northern analysis showed that corin 
mRNA was highly expressed in the human heart. In 
mice, corin mRNA was detected by in situ hybridization 
in the cardiac myocytes of the embryonic heart as early 
as embryonic day (E) 9.5, By Ell.5-13.5, corin mRNA was 
most abundant in the primary atrial septum and the 
trabecular ventricular compartment. Expression in the 
heart was maintained through the adult. In addition, 
mouse corin mRNA was also detected in the prehyper- 
trophic chrondrocytes in developing bones. By fluores- 
cent in situ hybridization analysis, the human corin 
gene was mapped to 4pl2-13 where a congenital heart 
disease locus, total anomalous pulmonary venous re- 
turn, had been previously localized. The unique domain 
structure and specific embryonic expression pattern 
suggest that corin may have a function in cell differen- 
tiation during development. The chromosomal localiza- 
tion of the human corin gene makes it an attractive 
candidate gene for total anomalous pulmonary venous 
return. 



Serine proteases are essential for a variety of biological proc- 
esses including food digestion, complement activation, and 
blood coagulation (1-3). In Drosophila, serine proteases are 
also involved in developmental pathways. For example, serine 
proteases encoded by the nudel, gastrulation defective, easier, 
and snake genes are key components of a proteolytic cascade 
that is critical for the establishment of the dorsal-ventral pat- 
tern in developing embryos (4-6). Genetic defects in these 
genes often lead to the disruption of the dorsal-ventral axis, 
resulting in embryonic lethality (7). 

Most serine proteases of the trypsin family are secreted 
proteins. Several members from this family have been identi- 
fied that contain an integral transmembrane domain. Hepsin, 
for example, is a serine protease expressed on the surface of 
hepatocj^es. Structurally, hepsin is a type II transmembrane 
protein with the transmembrane domain at its amino terminus 
and the protease domain at the carboxyl terminus exposed to 



* The costs of publication of this article were defrayed in part by the 
payment of page charges. This article must therefore be hereby marked 
advertisement** in accordance with 18 U.S.C. Section 1734 solely to 
indicate this fact. 

The nucleotide seqiience(s) repainted in this paper has been submitted 
to the GenBank'^^^^ lEBJ Data Bank with accession number(s) AF 133845. 

t To whom correspondence should be addressed: Berlex Biosciences, 
15049 San Pablo Ave., Richmond, CA 94804. Tel.: 510-669-4737; Fax: 
510-669-4246; E-mail: qingyu_wu@berlex.com. 



the outside of the cell (8). In tissue culture studies, hepsin weis 
shown to contribute to hepatocyte growth (9). However, the 
physiological significance of the growth stimulating activity of 
hepsin remains unknown (10). In Drosophila, Stubble-stub- 
bloid protein, another transmembrane serine protease, shares 
structural similarities with hepsin (11). Genetic studies dem- 
onstrated that Stubble-stubbloid is essential for epithelial mor- 
phogenesis and development of the fruit fly. Defects in the 
Stubble-stubbloid gene cause malformation of legs, wings, and 
bristles. Most recently, other transmembrane serine proteases 
were isolated and cloned from human trachea and small in- 
testine (12, 13). The biological function of these newly discov- 
ered membrane-bound serine proteases has not yet been 
determined. 

In this study, we report the cloning of a cDNA from the 
human heart that encodes a novel transmembrane serine pro- 
tease, designated corin. Corin has a predicted structure of a 
type II transmembrane protein containing two frizzled-like 
cysteine-rich motifs, seven LDL^ receptor repeats, a macro- 
phage scavenger receptor-like domain, and a trypsin-like pro- 
tease domain in the extracellular region. In situ hybridization 
revealed that corin mRNA was expressed in the embryonic 
heart as early as E9.5, and the expression in the heart was 
maintained through the adult stage. In addition, corin mRNA 
was detected in prehypertrophic chrondrocytes of the develop- 
ing bones. The unusual domain structures and specific expres- 
sion pattern suggested that corin may have a function in cell 
differentiation during embryonic development. 

EXPERIMENTAL PROCEDURES 

Materials — Human cancer cell lines, HEC-l-A (endometrium adeno- 
carcinoma), U2-OS (osteosarcoma), SK-LMS-1 (vulva sarcoma), RL95-2 
(endometrium carcinoma), and AN3-CA (endometrium adenocarci- 
noma) were obtained from the American Type Culture Collection 
(ATCC). Human heart cDNA libraries and human and mouse multiple 
tissue Northern blots were purchased from CLONTECH (Palo Alto, 
CA). Mouse tissue sections used for in situ hybridization were pur- 
chased from Novagen (Madison, WI). Tissue culture media and supple- 
ments were from Life Technologies Inc. All other chemicals were ob- 
tained from Sigma. 

Isolation of Human Corin cDNA Clones — An expressed sequence tag 
(EST) clone was found in a human heart cDNA library from the Incyte 
EST data base that shared significant sequence homology with trypsin, 
indicating that the EST may encode a novel serine protease gene. A 
2.1-kb EcoRl-Xhol insert from this EST clone was used to screen a 
human heart cDNA library (CLONTECH). Approximately, 5 x 10® 
lambda phage clones were screened, and two positive clones were iso- 
lated that contained inserts of 3.5 and 3.1 kb, respectively. The DNA 
sequences of these two clones were determined. Oligonucleotide prim- 



^ The abbreviations used are: LDL, low density lipoprotein; EST, 
expressed sequence tag; FISH, fluorescent in situ hybridization; 
GAPDH, glyceraldehyde-3-phosphate dehydrogenase; ORF, open read- 
ing frame; RT, reverse transcriptase; PCR, polymerase chain reaction; 
RACE, rapid amplification of cDNA ends; TAPVR, total anomalous 
pulmonary venous return; kb, kilobase pair; bp, base pair; E, embryonic 
day. 



14926 



This paper is available on line at http://www.jbc.org 



Novel Serine Protease cDNA from Human Heart 



14927 



ers were designed to clone further 5' end cDNA sequences by 5' rapid 
amplification of cDNA ends (RACE) using Marathon-ready human 
heart cDNA templates (CLONTECH). The PGR products from 5' RACE 
were cloned into pCRIl vector (Invitrogen, San Diego, OA) and se- 
quenced. Oligonucleotide primers used in the 5' RACE experiments 
were 5 -CAGTTGGTTTGAACAAGTGCAGGG-3', 5 -TGCAAGGAGG- 
GATACGCTCGCCTG-3', 5'-AATCCCAAGAACAGACTCACAGCG-3', 
5'-CGGGTCACAGAGAGAGCTACCACC-3', 5'-GGTCTCCTTCTTGA- 
CATGAATCTG-3', 5'-CGGAGCCCCATGAAGTTAAACCA-3', and 5'- 
AACAAAAGGATCCTTGGAGGTCGGACGAGT-3'. The final 5' end se- 
quence of human corin cDNA was derived from at least three 
independent clones. The full-length cDNA sequence was compiled using 
the Genetics Computer Group (GCG) software (version 9.1, Madison, 
WI). 

Northern Analysis — Northern blots containing poly(A)^ RNA sam- 
ples (2 ixgAane) from multiple human and mouse tissues were pur- 
chased from CLONTECH. Human and mouse corin cDNA probes were 
labeled with [^^PJdCTP using a random primed DNA labeling kit (Roche 
Molecular Biochemicals). Northern hybridization was performed at 
42 'C overnight in a solution containing 40% formamide, 5x Denhardt's 
solution, 6x SSC, 100 ptg/ml salmon sperm DNA. and 0.1% SDS. Blots 
were washed with 0.2 X SSC, 0.1% SDS at 60 "C and then exposed to 
Fuji imaging plates. As a control, the blots were reprobed with a human 
actin cDNA probe provided by CLONTECH. 

RT-PCR—mRNA samples were isolated from Hec-l-A, U2-OS, SK- 
LMS-1, and AN3-CA cells using a commercial RNA preparation kit 
(Oligotex Direct mRNA Mini Kits, Qiagen). First strand cDNAs were 
synthesized using Superscript 11 RNase" reverse transcriptase (Life 
Technologies Inc.). Human corin-specific oligonucleotide primers (sense 
primer, 5'-AACAAAAGGATCCTTGGAGGTCGGACGAGT-3', and anti- 
sense primer, 5'-CGGAGCCCCATGA AGTTAATCCA-3') were used to 
amplify a 630-bp fragment of corin cDNA between nucleotides 2475 and 
3105. Oligonucleotide primers TFRl (5'-GTCAATGTCCCAAACGT- 
CACCAGA-3') and TFR2 (5'-ATTTCGGGAATGCTGAGAAAACAGA- 
CAGA-3'), derived from the human glyceraldehyde-3-phosphate dehy- 
drogenase (GAPDH) gene, were used as an internal quantification 
control. PCR reactions were performed with a thermal cycler (Perkin- 
Elmer, model 480). PCR products were separated on 1% agarose gels 
and visualized by ethidium bromide staining. 

In Situ Hybridization — Mouse adult heart and embryonic tissue 
sections were deparaffinized in xylene, rehydrated, and fixed in 4% 
paraformaldehyde. The tissues were digested with proteinase K (20 
^g/ml), then treated with triethanolamine/acetic anhydride, and dehy- 
drated. An 800-bp mouse corin cDNA fragment from the coding region 
was cloned into pCRII (Invitrogen) in two orientations to yield plasmids 
pMll and pM41. The plasmids were linearized by Hfndlll digestion. 
Sense and antisense probes were synthesized using T7 RNA polymer- 
ase (T7/SP6 transcription kit, Roche Molecular Biochemicals) and la- 
beled with [^^P]UTP (Amersham Pharmacia Biotech). The hybridiza- 
tion was carried out as described (14). The slides were dehydrated and 
dipped in Kodak NTB-2 emulsion and exposed for 4 weeks in light-tight 
boxes at 4 "C. Photographic development was carried out in a Kodak 
D-19 developer. The slides were stained with hematoxylin/eosin and 
analyzed using both light- and dark-field optics of a Zeiss microscope. 

Fluorescent in Situ Hybridization (FISH) Analysis — Pi phage clones 
containing the human corin gene were isolated by filter hybridization 
using a human corin cDNA as the probe. One clone was confirmed by 
DNA sequencing using a primer from human corin cDNA. The DNA 
fragment from this PI phage was labeled with digoxigenin-dUTP. The 
labeled probe was combined with sheared human DNA and hybridized 
to metaphase chromosomes derived from PHA-stimulated peripheral 
blood lymphocytes in a solution containing 50% foTTnamide, 10% dex- 
tran sulfate, and 2x SSC. Hybridization signals were detected by flu- 
orescent-labeled antidigoxigenin antibodies and counter-staining with 
4,6-diaminoidino-2-phenylindole. A total of 80 metaphase cells were 
analyzed of which 74 cells exhibited specific labeling. 

Homology Model of the Protease Domain of Corin — A model of the 
corin protease domain (amino acids 802-1042) was built based on the 
structure of bovine chymotrypsinogen A at 1.8-A resolution (15, 16), 
using the homology program (Insight 11, 1995, MSI, San Diego, CA). 
Rotamcrs were used for non-identical side chain replacements (16). 
Coordinates for the loop insertions were extracted from the Brookhaven 
protein data bank (17). The model was refined by energy minimization 
using the AMBER force field (Discover 95.0), with a distance-dependent 
dielectric constant. The minimization used the steepest descents and 
conjugate gi-adient methods as follows: first for the loops only where 
insertions and deletions occuiTed, then side chains, and a final round of 
minimization keeping the Ca atoms fixed. The residues of corin (His**^, 



Asp®*^, and Ser^®**) corresponding to the catalytic triad of the template 
structure were also held fixed. 

RESULTS 

Cloning of the Full-Length Human Corin cDNA — A computer 
search using the BLAST program identified an EST clone from 
a human heart library that shared significant homology with 
serine protease family members, such as trypsin. The EST 
clone was used to isolate the full-length cDNA of a novel gene, 
designated corin for its abundant expression in the heart. The 
sequence of the full-length corin cDNA, 4933 bp in length, is 
shown in Fig. 1. The size of the cDNA is consistent with the 
length of corin raRNA (^5 kb) detected by Northern analysis 
(Fig. 4A). An ATG codon is located at position 95 that may 
represent the translation initiation site. The open reading 
ft-ame (ORF) spans 3126 bp with a 5 '-untranslated region of 94 
nucleotides before the initiation codon. At the 3' end, there is a 
1.7-kb 3 '-untranslated region after the stop codon at position 
3221. A polyadenylylation signal of AATAAA is present 12 
nucleotides before the poly(A)^ tail. 

The Domain Structure of Human Corin — The ORF of the 
human corin cDNA encodes a polypeptide of 1042 amino acids 
with a calculated mass of 116 kDa. At the amino terminus of 
the predicted corin protein, there is no discernible signal pep- 
tide sequence. Hydropathy plots using the GCG program iden- 
tified a highly hydrophobic region between amino acids 46 and 
66 (Fig. 2B). This hydrophobic sequence could serve as a po- 
tential transmembrane domain. There are positively charged 
amino acid residues immediately preceding the putative trans- 
membrane segment, suggesting that corin is a type 11 trans- 
membrane protein with the amino terminus present in the 
cytosol (18). Consistent with this hypothesis, there are 19 pre- 
dicted N-linked glycosylation sites present in the extracellular 
domains of corin (Fig. 1). 

Analysis of the corin protein sequence showed that in the 
extracellular region there are two frizzled-like cysteine-rich 
domains, seven LDL receptor repeats, one macrophage scav- 
enger receptor-like domain, and one trypsin-like serine prote- 
ase domain (Fig. 2A). As shown in Fig. 2A, two frizzled-Hke 
cysteine-rich domains are located at amino acids 134-259 and 
450-573, respectively. Amino acid sequences of these two do- 
mains share significant similarities with the extracellular cys- 
teine-rich domain of the Drosophila Frizzled protein, a seven- 
transmembrane receptor essential for polarity determination 
during the development of the fi:-uit fly (19). The firizzled-Uke 
cysteine-rich domains have also been found in other proteins, 
such as Dfz2 in Drosophila (20), Lin- 17 in Caenorhabditis 
elegans (21), and FZ-1 in human (22). The sequences of the two 
frizzled-like cysteine-rich domains in corin are closest to those 
in Lin-17 and FZ-1. As shown in Fig. 2C, all the 10 conserved 
cysteine residues are present in the frizzled-Uke cysteine-rich 
domains of corin. 

Between amino acids 268-415 and 579-690 (Fig. 2, A and 
D), there are seven cysteine-rich repeats homologous to the 
LDL receptor class A repeats (23). Each repeat is about 36 
amino acids long and contains six cysteine residues as well as 
a highly conserved cluster of negatively charged amino acids. 
In the LDL receptor, these cysteine-rich repeats bind calcium 
ions and play an essential role in endocytosis of the extracel- 
lular ligands (23). Similar motifs have been found in the extra- 
cellular domain of other membrane receptors, such as LDL 
receptor-related protein (LRPl) (24), megalin (also known as 
LRP2 or gp330) (25), complement proteins (26), enterokinase 
(27), and Drosophila proteins yolkless and nudel (28, 29). 

In addition to the fi^izzled-like cysteine-rich domains and 
LDL receptor-like repeats, there is another cysteine-rich region 
between amino acids 713 and 801 in corin (Fig. 2, A and E). 



14928 



Novel Serine Protease cDNA from Human Heart 



X AAAT^TCCCTACTGCCTCCCCGG(»CACACCTACACCMIACAAJULCCGACCAACA 60 

61 ACTtM»CAGAAOAATAAGCGAGACTTTTTATCX»rGAAACACTCTCCTGCCCTCGCTCCS 12 0 

MKOSPALAP 9 

131 CAA(^GCCC7ACCGCACACCCCCCTCCCCJIAAGCXX»TCTTGACACCTCATGACAATAAC IBO 

10BEBrRRAGSPKPVI.BAODI)K 39 

IBl ATGCGCAATGGCTGCTCTCAGAXCCTCCCGACTGCTAACCrrcxnTCaJTTCCTATTGCTC 210 

SOHCUGCSQKLATANLLR F L L L 49 

241 GTCCTGATTCCATGTATCTGTOCTCTCOrTCTCrTCCTCCTCATCCTGCTTTCCTATGTT 300 

50 ^i.iPCT.CKi.vi.Li.wii.\. s y V 69 

301 CGAACATTACAAAAGGTCTAnTTAAATCAAATCGCACTGAACCTTTGCTCACTGATGGT 360 

TCGTLQKVYFKSnCSEPLVTDG 89 

361 GAAATCCAAGGGTCCGATCTTATTCTTACAAATACAATTTATAACCAGAGCACTCTCGTC 4 20 

90EIOGSOVILrNTIYaOSTVV 109 

4 21 TCTACTCCACATCCCGACCAACACGTTCCAGCCTGGACTACGGATGCrTCTCTCCCAGCG 480 

lie STAHPDQUVPAHTTOASLPG 129 

4B1 GACCAAAGTCACAGGAATACAAGTGCXrrGTATGAACATCACCCACAGCCAGTGTCAGATG 540 

130 DQSHR£TSACH]«ITUSCCOM 149 

541 CTCCCCTACCACGCCACGCTGACACCTCTCCTCTCAGTTGTCAGAAACATGGAAATGGAA 600 

150 LPYHATLTPLLSVVBMHEHB 169 

601 AACTTCCTCAA&TTTTTCACATATCTCCATCGCCTCACTTGCTATCAACATATCATCCTG 660 

ITCKFLKPFTYLHRLSCrOBIML 189 

661 TTTGCCTGTACCCTCCCCTTCOCTGAGTGCATCATTGATGGCGATCACAGTCATGGACTC 720 

190 PGCTLAPPEC I IDGOOSHGL 209 

721 CTGCCCTGTAGGTCCTTCTGTCAGGCTGCAAAAGAAGGCTGTGAATCAGTCCTGGGGATG 780 

210 LPCRSPCEAAK£GCESVLGH 339 

7B1 CTCAATTACTCCTGGCCGGATTTCCTCAGATCCTCCCAGTTTAGAAACCAAACTGAAAGC 84 0 

230 VfiYSWPDFLRCSOPRnOTBS 249 

641 ACCAATGTCAGCAGAATTTGCTTCrCACCTCACCAGCAAAACGGAAAGCAATTGCTCTGT 900 

250 S£VSRICFSPOOElilGKOLLC 269 

901 GGAAGGGGTGAGAACTTTCTGTGTGCCAGTGGAATCTGCATCCCCGGGAAACTGCAATGT 960 

270 GRG ENFLCASG t C I PGK LOG 3S9 

961 AATG6CTACAACGACTGTGACGACTGCAGTGACGAGGCTCATTGCAACTGCAGCCACAAT 1020 

290 NGYHOCDDWSDEAHCUCS EH 309 

1021 CTGTTTCACTGTCACACAGGCAACTGCCTTAATTACAGCCTTGTGTGTGATGGATATGAT 1080 
310 LFUGHTGKCLg^S^^C^C^^ 

10 81 gactgtggggatttgagtgatgagcaaaactctgattgcaatcccacxacagagcatcgc 1140 

330 dcgolsobqnc:dch?ttbbr 349 



1141 TGCGGGGACGGCCGCTCCATCGCCATGGACTGGGTGTGTGATGGTGACCACGACTGTGTG 1200 

350 CGOCRCZAHEWVCDCDBDCV 369 

1201 GATAAGTCCGACGAGGTCAACTGCTCKTGTCACAGCCAGGGTCTGGTGCAATGCAaAAAT 1260 

370 DK3DEVBCSCHS0GLVECRH 389 

1261 GGACAATGTATCCCCAGCACGTTTCAATGTGATGGTGACGAGCACTGCAAGGATGGGAGT 1320 

390 GOCIPSTFOCDGDEDCKDGS 409 

1321 CATCAGGAGAACTGCAGCGTCATTCACACTTCATGTCAAGAAGGACACCAAAGATCCCTC 1380 

410 DEEUCSVIOTSCOBCDORCL 439 

1 3 B 1 TACAATCCCTGCCTTGATTCATGTGGTGGTAGCTCTCTCTCTGACCCG AACAACAGTCTG 14 4 0 

430 VNPCLDSCGGS3LCDPHHSL 449 

1441 AATAACTGTAGTCAATGTGAACCAATTACATTGGAACTCTGCATQAATTTCCCCTACAAC 1500 

450 NNCSQCEPITLELCMNLPYH 469 

1501 AGTACAAGTTATCCAAATTATTTTGGCCACAGGACTCAAAAGGAAGCATCCATCAGCTGG 1560 

470 STSYPHyPGHRTORCAS I SM 489 

1561 GAGTCTTCTCTTTTCCCTGCACTTGrrCAAACCAACTGTTATAAATACCTCATGTTCTTT 1620 

490 ESSLFPALVQTMCYKYLMFF 509 

1621 TCTTGCACCATTTTGCTACCAAAATGTGATGTGAATACAGGCGAGCGTATCCCTCCTTGC 1680 

510 SCTILVPKCDVNTGERI PPC 529 

1681 AGGGCATTGTGTGAACACTCTAAAGAACGCTGTGAGTCTGTTCTTGGGATTGTGGGCCTA 1740 

530 RALCER5KERCBSVLCIVGL 549 

1741 CAGTGGCCTGAAGACACACATTGCAGTCAATTTCCAGaGGAAAATTCAGACAATCAAACC 1800 

550 OWPEDTDCSQFPEEHSDHQT 569 

1801 TOCCTGATGCCTGATCAATATGTGGAAGAATGCTCACCTACTCATTTCAAGTGCCGCTCA 1860 

570 CLMPDEYVEECSPSHFKCRS 589 

1861 GGACAGTGTCTTCTGGCTTCCAGAAGATGTGATGGCCAGCCCGACTGTGACGATCACAGT 1930 

590 CQCVLASRHCDGOADCDDDS 609 

1921 GATGAGGAAAACTGTGGTTGTAAAGAGAGAGATCTTTGGGAATGTCCATCCAATAAACAA 1980 

610 DEENCGCXERSLWeCPSMKO 629 

1981 TGTTTGAAGCACACAGTGATCTCCCATCCCTTCCCAGACTGCCCTGATTACATGCACGAG 204 0 

630 CLKHTVICDGFPDCPDYHDE 6*9 

2041 AAAAACTGCTCATTrrGCCAAGATGATGAGCTGGAATGTGCAAACCATCCGTGTCTGTCA 2100 

650 KgCSFCODDELECANHACVS 669 

2101 CGTGACCTCTGGTGTCATGCTGAAGCCGACTGCTCAGACAGTTCAGATGAATGGGACTCT 3160 

670 RDLHCDGBADCSDSSDBHDC 689 

2161 GTGACCCTCTCTATAAATGTGAACTCCTCTTCCTTTCTGATGGTTCACAGAGCTGCCACA 2220 

690 VTLSIMVflSSSFLMVOHAAT 709 

3231 OAACACCATGTGTGrGCAGATGGCTGCCAGGAGATATTGACTCAGCTGGCCTGCAAGCAG 33 BO 

710 BBHVCADGWQEZ LSQLACKO 729 



3281 ATCCCTTTACCACAACCATCTGTGAOCAAATTGATACACCAACAGCACAAAGAGCCGCCC 33 40 
730 HGLGCPSVTKLIQEQBKBPR 749 

2341 TGGCTGACATTACACTCCAACTGCGAGAGCCTCAATGGGACCACTTTACATGAACTTCTA 34 00 
750 HLTLBSHHBSL^GTTLH ELL 769 

3401 CTAAATGGGCACTCTTCTCAGAGCACAAGTAAAATrrCTCTTCTCTGTACTAAACAAGAC 34 60 
7 70 VHG0SCBSRSKX6LLCTKOI> 789 

3461 TCTGGGCGCCGCXXrrGCTCCCCaAATCAACAAAACGATCCrTCCAGGTCCCACGACTCCC 3530 
790 CGRRPAAHMNKR^XLGCRTSR B09 

3521 CCTGGAAGGTGGCCATGCCACTGTTCTCTGCAGAGTGAACCCACTGGACAIATCTGTGGC 3580 
810 PGRUPHQC6L08BP8GI1 fCG 839 

3 581 TGTGTCCTCATTGCCAACAACTGGGrrCTGACAGTTCCCCACTGCTTCGACGGGAGACAC 3640 
830 CVLI AKRHVLTVA|{CrEGRE B49 

3 641 AATGCrrcaVCTTTGGAAAGTGGTCCTTGGCATCJUlCAATCTACACCATOCATCAGTGTTC 2703 
aSO HAAVHKVVL G I H HLDH P 6V T 869 

3701 ATGCACACACGCTTTGTGAACACCATCATCCTCCAnXXXX»rrACAGTCCJU;CAGTGGTC 3760 
870 MOTRFVKT IILRPRYSRAVV 889 

3761 GACTATGACATCAGCATCCTTGAGCTCACTCAAGACATCAGTGAGAC7GCCTACGTCCGG 3830 
890 DYj^ISIVELSEDISBTGYVR 909 

3831 CCTGTCTOCTTGCCCAACCCGGAGCACTGGCTAGAGCCTGACACGTACTGCTATATCACA 3880 
910 PVCLPNPBQWLBPOTYCYIT 929 

3881 GGCTGGGGCCACATGGGCAATAAAA7GCCATTTAAGCTGCXAGACGGAGAGGTCCGCATT 3940 
930CWCHMCHKMPFRI.OECEVRI 949 

3941 AlTTCTCTGGAACATTCTCAGTCCTACTTTGACATGAAGACCATCACCACTCXiGATGATA 3000 
930 ISLEHCQSYFDHKTITTRMZ 969 

3001 TGTGCTGOCTATGACTCTGGCACAGTTCATTCATGCATCGCTCACACCCCTCCCCCTCTT 3060 
970 CAGYESGTVDSCHGDSGGPL 989 

3061 GTTTGTGAGAAGCCTCGAGGACGCTCGACATTATTTCCATTAACTTCATGCCCCTCCGTC 3130 

990 VCEKPCGRHTLFGLTSHCSV 1009 

3121 TGCrrTTCCAAAGTCCTGGGGCCTGGCGTTTATAGTAATGTGTCATATTTCGTCGAATGG 3180 

1010 CFSKVLGPCVYSflVSYrVEW 1039 

3161 ATTAAAAGACAGATTTACATCCAGACCTTTCTCCTAAAC TAA TTATAAGQATGATCAQAG 3240 
1030 IKROIYIQTPLLN* 

3241 ACTTTTGCCAGCTACACTAAAAGAAAATGGCCrrCTTGACTGTGAAGAGCTGCCTGCAGA 3300 

3301 GAGCTGTACAGAAGCACTTTTCATCGACAGAAATGCTCAATCCTGCACTCCAAATTTGCA 3360 

3361 TGTTTGTTTTGGACTAATTTTTTTCAATTTATTTTTTCACCTTCATTTTTCTCTTATTTC 3420 

3421 AAGTTCAATGAAAGACTTTACAAAAGCAAACAAAGCAGaCTTTGTCCTTTTGCCAGGCCT 34 80 

3481 AACCATGACTGCAGCACAAAATTATCGACTCTGGCGAGATTTAAAATCAGCTGCTACAGT 3540 

3541 AACAGCTTATGGAATGGTCTCTTTTATCCrATCACAAAAAAAGACATAGATArTTAGGCT 3600 

3601 CATTAATTATCTCTACCAGTTTTTGTTTCTCAACCTCACTCCATAGTGGTAAA7TTCAGT 3660 

3661 GTTAACATTGGAGACTTGCrrrrCTTTTTCTTTTTTTATACCCCACAATTCTTTTTTATT 3720 

3721 ACACTTCGAATTTTAGGGTACACGAGCACAACGTGCACGTTAGTT ACATATCTATACATG 3780 



3781 TGCCATGTTGGTCTGCTGAACCCAGTAACTCGTCArTTGATTTATTAAAAGCCAAGATAA 3840 

3841 TTTACATGTTTAAAGTATTTACTArr ACCCCCTTCTAATGTTTGCATAATTCTGAGAACT 3900 

3901 GATAAAAGACAGCAATAAAACACCAGTGTCATCCATTTAGGTAGCAAGACATATTGAATG 3960 

3961 CAAACTTCTTTAGATATCAATATTAACACTTGACATTATTGCACCCCCCATTCTCGATCT 4020 

4 021 ATATCAAGATCATAATTTTATACAAGAGTCTCTATAGAACTGTCCTCATAGCTGGGTTTG 4080 

4081 TTCAGGATATATGAGTTGGCTGATTGAGACTGCAACAACTACATCTATATTTATGGGCAA 4140 

4141 TATTTTGTTTTACTTATGTGCCAAAGAACTGCATATTAAACT-rTGCAAAACAGAATTTAC 4200 

4 201 ATGAGAGATGCAATTTTTTAAAAAGAAAATTAATTTGCATCOCTCGTTTAATTAAATTTA 4 260 

4 261 TTTTTCAGTTTTCTTGCGrTCATCCATACCAACAAAGTCATAAAGACCATATTTTAGAGC 4320 

4 321 ACAGTAAGACTTTGCATGGAGTAAAACATTTTGTAATTTTCCTCAAAAGATGTTTAATAT 4 380 

4JH1 CTGGTTTCTTCTCATTCGTAATTAAAATTTTAGAAATGArrT7TAGCTCTAGGCCACTTT 4440 

4 441 ACCCAACTCAATTTCTCAAGCAATTAGTGGTAAAAACTATTTTTCCCCACTAAAAAACTT 4 500 

4 501 TAAAACACAAATCTTCATATATACTTAATTTAATTAGTCAGGCATCCATTTTGCCTTTTA 4 560 

4 561 AACAACTAGG ATTCCCTACTAACCTCCACCACC AACCTGGACTGCCTCAGCATTCCAAAT 4620 

4 621 AGATACTACCTGCAATTTTATACATGTATTTTTGTATCTTTTCTGTGTGTAAACATAGTT 4 680 

4 681 GAAATTCAAAAAGTTGIACCAATTTCTATACTATTCATCTCCTCTCCTTCAGTTTGTATA 4740 

4741 AACCTAAGGAGAGTGTCAAATCCAGCAACTGAATTGTGGTCACGATTCTATCAAAGTTCA 4800 

4 801 AGAACATATGTCAGTTTTGTTACACTTCTAGCTACATACTCAATGTATCAACTTTTAGCC 4860 

4 861 TGCTCAACTTAGGCTCAGTGAAATATATATATTATACTTATTTTAAATAATTCTTAATAC 4920 

4 921 AAATAAAATGGTA 4933 



Fig. 1. Nucleotide sequence of human corin cDNA and its deduced amino acid sequence. The potential codon for the initial methionine, 
the translation stop codon, and the polyadenylylation signal were in bold-face type and underlined. The putative transmembrane domain was 
double underlined. The 19 potential AT-linked glycosylation sites are in boldface type and double underlined. An arrow indicates the putative 
cleavage site for the activation of the serine protease. The active site residues of the catalytic triad (His^^, Asp*®*, and Ser®®°) are in boldface type 
and underlined. 



Novel Serine Protease cDNA from Human Heart 

A 



14929 



NH2 




COOl 



B 



Goldman 
Kytc-DooUnte 



Hphoblc 



Fig. 2. A, a schematic presentation of 
the domain structure of corin protein. The 
transmembrane domain {TM), frizzled- 
like cysteine-rich domains (CRD), LDL 
receptor repeats (LDLR)^ scavenger re- 
ceptor cysteine-rich domain (SRCR), and 
serine protease catalytic domain (Catalyt- 
ic) are indicated. Numbers correspond to 
the amino acid residues of the ORF shown 
in Fig. 1. B, hydropathy plots of the de- 
duced amino acid sequence of corin by 
Goldman and Kyte-Doolittle methods, re- 
spectively (36). H phobic y hydrophobic; 
Hphilic^ hydrophilic, C, alignment of 
amino acid sequences of the frizzled-like 
cysteine-rich domains from corin and 
other members of the frizzled family, in- 
cluding Frizzled in Drosophilay lin-17 in 
C. elegans, and FZ-1 in human. D, align- 
ment of amino acid sequences of the seven 
LDL receptor repeats of corin with the 
consensus sequence derived from the hu- 
man LDL receptor. E, alignment of amino 
acid sequences of the scavenger receptor- 
like cysteine-rich domains from corin and 
human enterokinase (Entk), sea urchin 
speract receptor (q 17064) and human 
scavenger receptor I (ol5393). Asterisks 
indicate conserved residues. F, alignment 
of amino acid sequences of protease do- 
mains from human corin, prekallikrein 
(KAL), enterokinase (ENTK), trypsin 
(TRPl), and bovine chymoti-ypsinogen A 
(CTRAX 



D 



Frizzled 


53 


CEPIT 


XSICKNIPYN 


MTIMPNLIGH 


TKQEEAGI.. , 


EVHQPAPLVK 


llnl7 


28 


CIPID 


I&LCKDLPYN 


YTYPPNTILH 


NDQH . . TliQT 


BTBHFKPLMK 


Humao FS 


39 


CQPIS 


IPI*CTOIAYN 


QTIHPMLLGH 


TNQEDAGIi. . 


EVHQPYPLVK 


Corin 


134 


RNTSACMNIT 


HSQCQMLP7H 


ATLTPLLSW 


RNME . . . MEK 


FLKPFTXT*HR 


Corin 


450 


NNCSOCEPIT 


LEIiCMNLPYN 


STSYPNtPGB 


RTQKEASISH 


ESSLFPAI*VQ 


Prizsled 


96 


. GCSDDLQLF 


LCSLyVPVCT 


I.LERP. .IP 


PCRSI^CB.SA 


RVCEKLMKTY 


linl7 


71 


TKCHPHIHPP 


ICSVPAPMCP 


IGMPQA. .VT 


SCKSVCEQVK 


ADCFSILEEF 


Human FZ 


82 


VQCSPELRPP 


LCSMYAPVCT 


V . I*EQA . . IP 


PCRSICERAR 


QGCEALMNKP 


Corin 


177 


I*SCyQHIMLP 


GCTIAPPECI 


XDGDDSHGLL 


PCRSFCEAAK 


EGCESVLGMV 


Corin 


500 


TNCYKYUIPP 


SCTILVPKCD 


VNTGER. .IP 


PCRALCBHSK 


ERCESVLGIV 


Frizzled 


142 


NPNWPENUEC 


SKFPVHGGED 


.LCV 






linl7 


119 


GIGWPEPLKC 


AQPPDPPE . . 


.LCMKP 






Human F2 


129 


GFQWPliHLRC 


EHPPRHGAEQ 


■ ICV 






Corin 


234 


NYSWPDFLRC 


SQPRNQTESS 


NVSRICPSP 






Corin 


546 


GLQWPEDTDC 


SQPPEENSPH 


QTCLMP 







Corin 
Corin 
Corin 
Corin 
Corin 
Corin 
Corin 

I/DIiR 



268 
305 
341 
378 
579 
616 
654 



u:gkgenfi 

NC . SGHIfH 
SCHSQ6LVE 
SCHSQGtiVE 
EC . SPSHPK 
GCKERDLWE 
PC . QDDELB 
C 



CASGICIP 

CHTGKCI^N 

CGDGRCIA 

CRNGQCIP 

CRSGQCVI. 

CPSNKQCLK 

CAHHACVS 

C G CI 



GKIiQCNGYND 
ySLVCDGYDD 
MEWVCDGDBD 
STFQCDGDED 
A8RRCDGQAD 
HTVTCDGFPD 
ROLWCDGEAD 
CD D 



CDDWSDEAHC 
CGDLSDEQNC 
CVDKSDEVNC 
CKDGSDEBNC 
CDDDSDEENC 
CPDYMDEKNC 
CSDSSDEWDC 
C D SDE 



Corin 713 

Enrk 702 

qi7064 287 

015393 63 



Corin 762 

Entk 746 

ql7064 337 

015393 109 



VCADGWOEIL SOLACKQMGL GEPSVTK , LI QEQEKEPRWL TLHSKWESLN 

.CAEWn'TQI SNDVCQLLGL GSGNS3K.PI PSTDGGP , FY KLNTAPD... 

ICDDGWDWAD AKVACRQAGV KGAIRSSGPQ GEDPGYT.VIG PtHTSVVTCT 

VCODDWHEKY GRAACRDWGY KNNFYSSQGI VDDSGSTSFM KLNT. . . . SA 
* « * * 

GTTLHELLVN GQ.SCESRSKI SLLCTKQDC 
GHX/...ILTP SO-OCLQDSI/I RLQCNHKSC 
GT, .ESSLAO . . . .CVLRDGW SHSCQHVED 
GKVDIYKKLY HSDACSSKAW SLRC 



ENTK 
TRPi 
CTRA 
Corin 

KAL 
EHTK 
TRPI 
CTRA 

Corin 

KAL 

ENTK 

TRPI 

CTRA 

Corin 

KAL 

ENTK 

TRPI 

CTRA 

Corin 

KAL 

ENTK 

TRPI 

CTRA 

Corin 



rivggtnssw 
kivggsnake 
kivggyncee 
rivnge*=;avp 
rilgcrtsrp 



GEWPWQVSLQ 
GAMPWWGLY 
NSVPYQVSL . 
GSWPWQVSLQ 
GRWPWQCSLQ 



VKLTAQRHLC 
y . . . GGRLLC 
. . .NSGYHFC 
DKT. .GFHFC 
SEPSG. .HIC 



PLQ.DVWRIY SGILNLSDIT K.DTPFSQIK 
NLEPSKWTAX LGLHMKSNLT SPQTVPRLID 

IQVR LGEHNIEVLB GNEQFINAAK 

. .GVTTSDW VAGEFDQGSS SEKIOKLKIA 
E.HAAKMKW LGINNL.DHP SVFMQTRPVK 



KLQAPLNYTE 
HLEFKVNYTD 
KLSSRAVINA 
KLSTAASFSQ 
ELSBOISETG 

KVNIPLVTNE 
EADVPLLSNE 
CLDAPVLSQA 
QASLPLLSNT 
EGEVRIISLE 



FQKPICLPSK 
YIQPICLPEE 
RVSTISLPTA 
TVSAVCLPSA 
YVRPVCLPNP 

ECQKRYQDYK 
RCQQQMPEYN 
KCEASYPG.K 
NCKKYWGT , K 
HCQSYFDMKT 



GDTSTIYTNC 
NQVFPPGRKC 
PPAT, ,GTKC 
SDDPAAGTTC 
SQWLEPDTYC 

ITQRMVCAGY 
ITENMICAGY 
ITSNMFCVGP 
IKDAMICAG. 
ITTRMICAGY 



GGSLIGHOWV 
CASLVSSDWL 
GG3LINEQWV 
GGSLINENWV 
GCVUAKKWV 

EIIIHQNYKV 
EIVINPHYNR 
.IIRHPOYDR 
KVPKNSKYNS 
TIILHPRYSR 

WVTGWGFSKE 
SIAGWGTWY 
LISGWGHTAS 
VTTGWGLTRY 
YITGWGHMGN 

KEGGKDACKG 
EEGGIDSCOG 
LEGGKDSCQG 
.ASGVSSCMG 
ESGTVDSCMG 



LTAAHCPDGL 
VSAAHCVYGR 
VSAGHCYKSR 
VTAAHC .... 
LTVAHCFEGR 

SEGNHDIALt 
RRKDNDIAMM 
KTLNN0IMLI 
LTINNDITLL 
AWDYDISIV 

KG.ETQNILQ 
QG . TTANILQ 
SGADYPDELQ 
TNANTPDRLQ 
KM.PFK. .LQ 

DSGCPLVC . K 
DSGGPLMC.Q 
DSGGPWCNG 
DSGGPLVCKK 
DSGGPLVCEK 



HNGMWHLVGI TSWGEGC . AR REQPGVYTKV AEYMDWI 
ENSRWFV ^GV TSFGYKC.AL PNRPGVYARV SRFTEWIQ 

Q LQGV VSWGDGC.AQ KNKPGVYTKV YNYVKWIKNT I 

.NGAV'rrLVGI VSWGSSTCS. TSTPGVYARV TALVNWVQQT L 
PGGRWTLPGL TSWGSVCFSK VLGPGVYSNV SYPVEWIKRQ lYIQTFLLN 



14930 



Novel Serine Protease cDNA from Human Heart 




C828-C844 
^ C817-C830 



CS55-C970 
C926-C991 
C981-C1010 



Fig. 3. Molecular model of the protease domain of corin be- 
tween amino acids 802 and 1042. A corin model was built based on 
the structure of bovine chymotrypsinogen A, as described under **Ex- 
perimental Procedures," The active site residues of the catalytic triad 
(His**"*"^, Asp"^'^, and Ser*""*) are shown in purple. Four disulfide bonds in 
the corin model (Cys^=***-Cys*"**, Cys^^'^-Cys"*'. Cys®'^^-Cys^®\ and 
Cys"3i_Qygioioj ii^Qi correspond to the disulfide bonds in the catalytic 
domain of chymotrypsinogen (Cys''^-Cys*'^^, Cys^'^^-Cys^*'^, Cys'^^- 
Cys^"', and Cys****-Cys^'*") are shown in blue. The side chains of Cys^^' 
and Cys®^** of the corin model are in an acceptable proximity to form a 
disulfide bond (pink). The distance between the C-a atoms from the 
chymotrypsinogen template (Val^* and Gly'") corresponding to these 
two cysteine residues is 5.08 A, and the distance between the sulfur 
atoms after rotamer searching of the cysteine side chains is about 2.5 A. 
The potential disulfide bond between Cys^°° and Cys**^^ of corin corre- 
sponding to the disulfide bond between Cys^ and Cys^^^ of chymo- 
trypsinogen is not included in the model. 



This region contains 88 amino acids and is homologous to the 
cysteine-rich motif found in the macrophage scavenger receptor 
(30). This motif is also present in the sea urchin spermatozoa 
speract receptor (31, 32) and the vertebrate serine protease, 
enterokinase (27), 

At the carboxyl terminus of corin protein between amino acid 
residues 802 and 1042, there is a trypsin-like serine protease 
domain (Fig. 2A). This protease domain is highly homologous to 
the catalytic domain of members of the trypsin superfamily. 
For example, amino acid sequence identities between corin and 
prekallikrein (33), factor XI (34), and hepsin (35) are 40, 40, 
and 38%, respectively. All essential features of serine protease 
sequences are well conserved in corin (Figs. 1 and 2F), The 
active site residues of the catalytic triad are located at His®"*^, 
Asp®®^, and Ser^®^. The amino acid residues forming the sub- 
strate specificity pocket are located at Asp^^^, Gly^°°^, and 
Olyiois ^Thege residues are predicted to bind the substrate PI 
residues, suggesting that corin would cleave its substrate after 
basic residues, such as lysine or arginine. In addition, a puta- 
tive activation cleavage site was found at Arg^*^^ suggesting 
that corin would be synthesized as an inactive zymogen and 
that another trypsin-like enzyme was required for its 
activation. 

In the protease domain, there are 12 cysteine residues. Po- 
tential pairing of these cysteine residues can be predicted by 
comparing with other well studied serine proteases, such as 
trypsin and chymotrypsin. First three pairs of cysteine resi- 
dues present in essentially all members of the trypsin super- 
family are located at Cys^^S-Cys^^, Cys^'^^-Cys^'^°, and 
Cys^^^-Cys'^'O, Two more pairs of cysteine residues are pres- 
ent at the positions Cys^^*'-Cys^^''^ and Cys^^®-Cys®^\ These 
two pairs of cysteine residues are commonly found in a sub- 
family of two-chain serine proteases, such as chymotrypsin and 
prekallikrein (33). The presence of Cys^^° and Cys^^^ indicated 



.5 

e 



CQ .3 m CO 





^ 4 



5 Kb 




5 Kb 



•a 1 s & 

ti ^ ^ 

H ^ CO t»3 



CO OQ 



i 




5KB 



Fig. 4. Northern analysis of corin mRNA expression. Human 
and mouse multiple tissue Northern blots were hybridized with human 
and mouse corin cDNA probes, respectively. In human tissues (A and 
B), corin mRNA was detected only in samples from heart. In mouse 
tissues (C), abundant expression of corin mRNA was detected in sam- 
ples from heart. Weak signals were also detected in samples from testis 
and kidney. 

that, after the activation cleavage at Arg®°^, the catalytic do- 
main of corin would remain attached to the rest of molecule by 
a disulfide bond. Interestingly, there is one additional pair of 
cysteine residues, Cys®^^ and Cys®*^°, present in corin. Cysteine 
residues at these two positions were not found in any other 
serine proteases in vertebrates. A search of data bases showed 
that a chymotr3rpsinogen-like serine protease from the lug- 
worm, Arenicola marina ^ had two cysteine residues at the 
corresponding positions.^ A model of the corin protease domain 
was built based on the structure of bovine chymotrypsinogen A 
(Fig. 3), Based on this corin model, where the C-« atoms of 
these two cysteine residues were held fixed during energy 
minimization, the distance between the sulfur atoms of their 
side chains is about 2.5 A after rotamer searching. The model 
indicates that these two cysteines are likely to form a disulfide 
bond connecting two /3-sheets in the core of the protease do- 
main (Fig. 3). 

Northern Analysis of Corin niRNA Expression — To deter- 
mine expression of the corin gene in human tissues. Northern 
hybridization was performed using human corin cDNA probes. 
As shown in Fig. 4A, an —S-kb transcript was detected only in 
the heart but not in other tissues including brain, placenta, 
lung, liver, skeletal muscle, kidney, pancreas, spleen, thymus, 
prostate, testis, ovary, colon, and leukocytes. Since the heart is 
mainly composed of cardiac muscles. Northern analysis was 

^ J. Eberhardt, GenBank'^" accession number G1160388. 



Novel Serine Protease cDNA from Human Heart 



14931 



Fig. 5. Analysis of cor in mRNA ex- 
pression by in situ hybridization in 
an adult mouse heart. Tissue sections 
from atrium (B) and ventricle (A) were 
stained with hematoxylin/eosin. Conn 
mRNA was detected by in situ hybridiza- 
tion using a mouse corin cDNA probe. Ex- 
pression of conn mRNA was found in the 
cardiac myocytes of both the atrium (D) 
and the ventricle (C> as shown by white 
spots. 








Fig. 6. Expression of corin mRNA in the developing heart. Tissue sections were prepared from mouse embryos at day E9.5 (A and B\ El 1.5 
(C and D\ E12.5 (E and F), and E15.5 (G-J) and stained with hematoxylin/eosin (A, C, E, G, and /). Corin mRNA expression was detected by in 
situ hybridization in developing heart by E9.5 (B) and El 1.5 (O) as indicated by arrows. The expression was prominent in the primary atrial septum 
and the trabecular ventricular compartment by E12.5 (F). By E15.5, corin mRNA was detected in most cardiac myocytes in both atrium iff) and 
ventricle (J). Abbreviations used in E, G, and / are as follows: Atr^ atrium; V, ventricle; Ar, aorta; Vc, vena cava; E, esophagus; Lu, lung. 



performed to examine the presence of corin mRNA in other 
human muscle-rich tissues. Again, corin mRNA was detected 
in the heart but not in uterus, small intestine, bladder, stom- 
ach, and prostate (Fig. 4B). 

To examine corin mRNA expression in mice, the full-length 
mouse corin cDNA was cloned by a PCR-based strategy. Mouse 
corin cDNA shared 89% sequence identities with human corin 
cDNA (data not shown). Northern analysis was performed with 
RNA samples from mouse tissues. As shown in Fig. 4C, a 
prominent transcript of ^6 kb was detected in samples derived 



from the heart. In contrast to Northern analysis with human 
samples, low levels of corin mRNA were also detected in sam- 
ples derived from the testes and kidneys. 

Mouse Corin mRNA Expression in Adult and Embryonic 
Hearts — In situ hybridization was performed to determine the 
temporal and special expression of corin mRNA. In adult mice 
(Fig. 5), corin mRNA was detected in cardiac myocytes of both 
atrium and ventricle. The level of expression appeared to be 
higher in the atrium than the ventricle. During embryonic 
development, corin mRNA was first detected at E9.5 in both 



14932 



Novel Serine Protease cDNA from Human Heart 




Fig. 7. Expression of corin mRNA in 
other tissues during embryonic de- 
velopment. Tissue sections were stained 
with hematoxylin/cosin. In situ hybridiza- 
tion was performed uning a mouse corin 
cDNA probe, as described under "Experi- 
mental Procedures." A and B, expression 
of corin mRNA in cartilage primordia of 
vertebral bodies of an E13.5 embryo. C 
and i), expression of corin mRNA in the 
turbinate primordium around the nasal 
and eye cavities of an E15.5 embryo. E 
and F, expression of corin mRNA in a 
developing digital bone in a front paw at 
E15.5. Corin mRNA was detected in the 
region adjacent to the hypertrophic chon- 
drocytes and in the perichondrocytes. G 
and H, in a more matured digital bone in 
a hind limb of an E15.5 embryo, a similar 
pattern of corin mRNA expression was 
found in the region adjacent to the hyper- 
trophic chrondrocytes and in the peri- 
chondrocytes. / and J, expression of corin 
mRNA in the medulla of a developing kid- 
ney at E15.5. /iC and L, expression of corin 
mRNA in the decidual cells of a pregnant 
uterus. Abbreviations used are: V, verte- 
bral bodies; N, nasal cavity; E, eye cavi- 
ties; Hy, hypertrophic chondrocytes; P, 
perichondrocytes. 




atrium and ventricle of the developing heart (Fig. 6B). Between 
El 1.5 and E13.5, corin mRNA was highly expressed in the 
thickened atrial wall and in the regions that underwent tra- 
beculation in the ventricle (Fig. 6, D and F). By E15.5, corin 
mRNA in the heart was more abundant, especially in primary 
atrial septa (Fig. 6H). Weak signals appeared to be present in 
developing aorta and vena cava but not in the esophagus and 
lungs (Fig. SH). The expression of corin mRNA in the heart was 



maintained in the subsequent embryonic stages (not shown). 

Corin mRNA Expression in Other Tissues — In addition to the 
heart, corin mRNA was also detected in other mouse tissues by 
in situ hybridization. For example, corin mRNA was present in 
the uterus of pregnant mice and in the developing kidneys. In 
the uterus (Fig. 7D, corin mRNA expression was most abun- 
dant in the decidual cells close to the implantation site of the 
embryo. In the developing kidneys at E15.5, corin mRNA was 



Novel Serine Protease cDNA from Human Heart 



14933 




Cortn 



GAPDH 



Fig. 8. Analysis of corin mRNA expression in tumor cell lines 
by RT-PCR. RNA samples were isolated from human tumor cell lines. 
RT-PCR experiments were performed using oligonucleotide primers 
derived from human corin cDNA. Corin mRNA was detected in samples 
from Hec-l-A, U2-OS, SK-LMS-1, RL95-2, and AN3-CA cells (upper 
panel, lanes 2-6) but not in samples from HeLa cells {upper panel, lane 
i). In a control experiment, PGR reactions were performed with specific 
oligonucleotide primers for the human GAPDH gene. GAPDH mRNA 
was detected in samples from all cell lines (lower panel, lanes 1-6). 

highly expressed in the stromal cells in the medulla but not in 
the cortex of the kidney (Fig. IJ). This finding was consistent 
with the results of Northern analysis in which a corin tran- 
script was found in RNA samples from mouse kidneys (Fig. 3C). 

Interestingly, in situ hybridization also identified corin 
mRNA in several cartilage-derived structures, such as the ver- 
tebra in the tail, the turbinate in the head, and the long bones 
in the limbs (Fig. 7, A and H). Fig. IB showed the 
expression of corin mRNA in cartilage primordia of vertebral 
bodies in the posterior of an E13.5 embryo. By E15.5, the level 
of corin mRNA expression in the vertebra was much lower as 
the vertebra became more matured (data not shown), indicat- 
ing that corin may play a role in the differentiation of chondro- 
cytes. This notion was supported by the expression of corin 
mRNA in developing limbs. Fig. 7, E and F, showed an early 
developing digital bone that consisted of three types of cells as 
follows: hypertrophic chondrocytes at the center, prehypertro- 
phic chondrocytes next to the hypertrophic zone, and prolifer- 
ating chondrocytes at the both ends. Corin mRNA was found 
mostly in the prehypertrophic chondrocytes (Fig. IF). Hybrid- 
ization signals were also present in perichondrium (Fig. IF). 
Fig. 7, G and H, showed a long bone in a hind limb that was at 
a more advanced developmental stage. The central hyper- 
trophic zone was replaced by vascularized tissues containing 
bone marrow cells and osteroblasts. Nevertheless, similar ex- 
pression pattern of corin mRNA was found in the narrow zone 
of the prehypertrophic chondrocytes and in the perichondrium. 
These results indicated that corin expression was associated 
with a specific stage of chondrocyte differentiation. 

Corin mRNA Expression in Human Tumor Cell Lines — A 
number of human cancer cell lines were screened by Northern 
and RT-PCR analyses for the presence of corin mRNA. In most 
cell lines, such as HL60, HeLa, K562, MOLT-4, RAJI, SW480, 
A549, and G36, corin mRNA was undetectable (data not 
shown). However, corin mRNA was found in several cell lines 
derived from uterus tumors or osteosarcoma. As shown in Fig. 
8, corin mRNA was detected by RT-PCR in endometrium car- 
cinoma cell lines HEC-l-A, AN3 CA, and RL95-2, leiomyosar- 
coma cell line SK-LMS-1, as well as in osteosarcoma cell line 
U2-0S. The result is consistent with the finding by in situ 
hybridization in which corin mRNA was highly expressed in 
the developing bones in embryos as well as in the maternal 
uterus. 

Chromosomal Localization of the Human Corin Gene — FISH 
analysis was performed to determine the chromosomal locus of 
the human corin gene. Specific fluorescent spots were found at 
4pl2-13, a region adjacent to the centromere on the short arm 
of chromosome 4 (Fig. 9). The result was confirmed in a subse- 
quent experiment in which a genomic probe previously mapped 
to 4pl5.3 was co-localized with the corin gene probe (data not 



shown). A search of the OMNI human genetic data base indi- 
cated that a congenital heart disease locus, total anomalous 
pulmonary venous return (TAPVR), was previously mapped to 
this region at 4pl3-ql2 (37). 

DISCUSSION 

In this study, we describe the cloning and initial character- 
ization of a novel cDNA from the human heart that encodes a 
putative transmembrane serine protease, which we have des- 
ignated as corin. The presence of a hydrophobic transmem- 
brane domain at its amino terminus and the absence of a signal 
peptide suggest that corin is a type II transmembrane protein. 
In the extracellular region of corin, there is a trypsin-like 
catalytic domain that contains all conserved structural fea- 
tures of serine proteases, such as the catalytic triad, the acti- 
vation cleavage site, the substrate specificity pocket, and the 
essential cysteine residues. Interestingly, the protease domain 
of corin contains two unique cysteine residues, Cys®^^ and 
Qyg83o^ that are not present in other trypsin-like serine pro- 
teases in vertebrates. Molecular modeling showed that these 
two cysteine residues are likely to form a disulfide bond con- 
necting two i3-sheets in the core of the protease domain (Fig. 3). 
A search of genomic data bases showed that a chymotrypsin- 
like protease found in the lugworm, A. marina, also has two 
cysteine residues at the corresponding positions. It is not clear 
whether these two cysteine residues are maintained through a 
convergent or divergent evolution. Nevertheless, the presence 
of such an unusual pair of cysteine residues in both corin and 
the lugworm protease suggests an important biological func- 
tion of the disulfide bond. One potential possibility is that the 
disulfide bond may contribute to stability of the proteases. 

Although members of the trypsin superfamily are known to 
contain a variety of domain structures such as kringle and 
epidermal growth factor-like domains that are important for 
protein-protein interactions, this is the first report of the pres- 
ence of a finzzled-like cysteine-rich domain in this extended 
family. Originally, the frizzled gene was identified in Drosoph- 
ila (38). The gene encodes a seven-transmembrane receptor 
that is required for proper development of hairs, bristles, and 
oramatidia of the fruit fly (19, 39). Later, other Frizzled pro- 
teins have been identified in many other species. They all 
contain a well conserved extracellular cysteine-rich domain 
and a seven-transmembrane domain and act as receptors for 
secreted Wnt glycoproteins (for review see Refs. 40 and 41). The 
cysteine-rich domain, which is about 120 amino acids in length 
and contains a motif of 10 invariantly spaced cysteine residues, 
has been shown to be necessary and sufficient for the binding of 
the Wnt ligands (20, 42). Recent studies demonstrated that 
Frzb, a secreted fi*izzled-like protein without the seven-trans- 
membrane domain, is expressed in the Spemann organizer of 
frog embryos and can bind and inhibit Wnt-8 (43, 44). In 
addition, similar frizzled-like cysteine-rich domains have also 
been found in several other proteins, including mouse collagen 
(XVIII) al chain (45), human carboxypeptidase Z (46), and 
several receptor tyrosine kinases (47-49). The function of the 
cysteine-rich domain in these proteins has not been deter- 
mined. Corin is unique in that it contains the ft4zzled-like 
cysteine-rich domains and a serine protease domain. The pres- 
ence of fi:4zzled-like domains in corin implies that corin may 
play an important role in development by directly interacting 
with Wnt proteins. 

The temporal and special pattern of corin gene expression 
further supported a potential developmental function of corin. 
In mice, corin mRNA was detected in the cardiac myocytes of 
the embryonic heart as early as E9.5 (Fig. 6B). The expression 
was most prominent in the primary atrial septum and the 
trabecular ventricular compartment by El 1.5-13.5 (Fig. 6, D 



14934 



Novel Serine Protease cDNA from Human Heart 



■> 

/ 1 



Corin 



I chr 4 

Fig. 9. Chromosomal localization of the human corin gene by FISH. A fluorescent-labeled genomic DNA probe containing the human 
corin gene was hybridized to metaphase chromosomes derived from PHA-stimulated peripheral blood lymphocytes. Hybridization signals are 
shown as bright blue spots and indicated by white arrows Heft panel). The position of the corin locus on human chromosome 4 is illustrated in a 
diagram {right panel). 



and F), During this period, an active process of looping and 
remodeling takes place in the embryonic heart. As a result, 
outflow tracts are formed, and the original single tube-like 
heart is reorganized into a four-chambered structure. Growth 
factors, such as bone morphogenic proteins and the transform- 
ing growth factor- family members, are known to play a crit- 
ical role during the embryonic heart development (50). Recent 
studies in Drosophila showed that the wingless iwg) gene, a 
homologue of the wnt oncogene in mammals, is directly in- 
volved in heart formation (51). It has been suggested that 
similar signaling pathways also contributed to the heart devel- 
opment in veri;ebrate (52). It is possible that corin could par- 
ticipate in such developmental pathways by interacting di- 
rectly with Wnt proteins or other growth factors. 

In addition to the heart, corin mRNA was identified in other 
tissues, such as the pregnant uterus and developing kidneys 
and bones. The expression of corin mRNA in these tissues 
appeared to be cell type-specific. For example, in developing 
long bones corin mRNA was specifically expressed in the pre- 
hypertrophic chrondrocytes. It is known that skeletal bones are 
derived from two different processes, intramembranous and 
endochondral ossification. In the former case, mesenchymal 
tissues are directly converted into bones, whereas in the latter 
case the mesenchymal cell is converted to bone via cartilage as 
an intermediate step. The vertebrae, long bones, and certain 
ft*agments of skull are formed by endochondral ossification (53). 
In these bones, mesenchymal cells first become chondrocytes 
that in turn differentiate from proliferating chondrocytes to 
prehypertrophic chondrocytes and finally to hypertrophic chon- 
drocytes- The hypertrophic chondrocjt-es eventually undergo 
apoptosis followed by vascularization and ossification. This 
process of chondrocyte differentiation has been shown to be 
tightly regulated by hedgehog proteins, bone morphogenic pro- 
teins, and parathyroid hormone-related protein (54—57). The 
specific expression of corin mRNA in a subset of chondrocytes 
indicated that corin may also be involved in this cell differen- 
tiation process. 

Finally, by FISH analysis the human corin gene was located 
on the short arm of chromosome 4 (4pl2-13) (Fig. 9). A search 
of the OMNI human genetic data base showed that a disease 



locus, total anomalous pulmonary venous return (TAPVR), had 
been previously mapped to this region. TAPVR is a rare cya- 
notic form of congenital heart defects in which the pulmonary 
vein connected abnormally to the right atrium or one of the 
venous tributaries instead of the left atrium. The molecular 
mechanism responsible for this developmental defect in the 
heart is unknown. A linkage study of a large Utah- Idaho family 
that included 14 affected individuals localized the TAPVR locus 
to a 30-centimorgan interval on 4pl3-ql2 (37). The findings 
that the corin gene and the TAPVR locus are co-locaUzed on 
chromosome 4 and that corin mRNA is highly expressed in the 
embryonic heart, particularly in the region where outflow 
tracts were formed, suggest that corin is an attractive candi- 
date for the TAPVR gene. The isolation of the corin cDNA 
provided a useful tool to study further this intriguing 
possibility. 

Acknowledgments — We thank Drs. W. Dole and G. Rubanyi for their 
encouragement and helpful discussions. 

REFERENCES 

1. Neurath, H. (1984) Science 224, 350-357 

2. Huber, R., and Bode. W. (1978) Ate Chem. Res. 11, 114-122 

3. Davie, E. W., Fujikawa, K., and Kisiel, W. (1991) Biochemintry 30, 

10363-10370 

4. Morisato, D., and Anderson, K. V. (1995) An/iw. Rev. Genet, 29, 371-399 

5. LeMosy, E. K., Kemler, D., and Hashimoto, C, (1998) Development 125, 

4045-4053 

6. Konrad, K. D., Goraiski, T. J., Mahowald, A. P., and Marsh, J. L. (1998) Proc. 

Natl, Acad. Sci. U. S. A. 95, 6819-6824 

7. Anderson, K. V., Schneider, D, S., Morisato, D., Jin, Y., and Ferguson, E. L. 

(1992) Cold Spring Harbor Syrnp. Quant. Biol. 57, 409-417 

8. Tsuji, A., Torres-Rosado, A., Arai, T., Le Beau, M. M., Lemons, R. S., Chou, 

S. H., and Karachi, K. (1991) t/. Biol. Chem. 266, 16948-16953 

9. Torres-Rosado, A., O'Shea, K. $., Tsuji, A., Chou, S. H., and Kurachi, K. (1993) 

Proc. Natl. Acad. Sci. U. S. A. 90, 7181-7185 

10. Wu, Q., Yu, D., Post, J., Halks-Miller, M., Sadler, J. E.. and Morser, J. (1998) 

J. Clin. Invest 101, 321-326 

11. Appcl, Ij. F., Prout, M., Abu-Shumays, R., Hammonds. A., Garbc, J. C, 

Fristrom, D., and Fristrom, J. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 
4937-4941 

12. Yamaoka, K., Masuda, K., Ogawa, H., Takagi, K., Umemoto. N., and Yasuoka, 

S. (1998) J. Biol. Chem. 273, 11895-11901 

13. Paoloni-Giacobino, A., Chen, H., Peitsch, M. C, Rossier, C, and Antonarakis, 

S. E. (1997) Genomics 44, 309-320 

14. Jen, Y,, Manova, K., and Benezra, R. (1997) Dev. Dyn. 208, 92-106 

15. Wang, D., Bode, W., and Huber, R. (1985) J. Mol. Biol. 185, 595-624 

16. Ponder, J. W., and Richards, F. M. (1987) J. Mol. Biol. 193, 775-791 

17. Bernstein, F. C, Koetzle, T. F., Williams, G. J., Meyer, E. E., Jr., Brice, M. D., 



Novel Serine Protease cDNA from Human Heart 



14935 



Rodgers. J. R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977)*/. Mol. 
Biol. 112, 535-542 

18. Hartmann, E., Rapoport, T. A., and Lodish, H. F. (1989) Proc. Natl. Acad. Sci. 

U. S. A 86, 5786-5790 

19. Vinson, C. R.. Conover, S., and Adler, P. N. (1989) Nature 338, 263-264 

20. Bhanot, P.. Brink, M.. Samos. C. H., Hsieh, J. C, Wang, Y., Macke, J. P., 

Andrew, D., Nathans, J., and Nusse, R. (1996) Nature 382, 225-230 

21. Sawa. H., Lobel. L., and Horvitz, H. R. (1996) Genes Dev. 10, 2189-2197 

22. Chan, S. D.. Karpf, D. B., Fowlkes, M. E.. Hooks. M., Bradley, M, S.. Vuong, V., 

Bambino, T., Liu, M. Y,, Amaud, C. D., Strewler, G. J., and Nissenson. R. A. 
(1992) J. Biol. Chem. 267, 25202-25207 

23. Brown, M. S., Herz, J., and Oildstein, J. L. (1997) Nature 388, 629-630 

24. Krieger, M., and Herz, J. (1994) An/m. Reu. Biochem. 63, 601-637 

25. Kounnas, M. Z„ Chappell, D. A., Strickland, D. K., and Argraves, W. S. (1993) 

J. Biol. Chem. 268, 14176-14181 

26. Catterail, C. F., Lyons, A., Sim, R. B., Day, A. J., and Harris, T. J. (1987) 

Biochem. J. 242, 849- 856 

27. Kitamoto, Y., Yuan, X., Wu, Q., McCourt, D. W., and Sadler, J. E. (1994) Proc. 

Natl. Acad. Sci. U. S. A. 91, 7588-7592 

28. Schonbaum, CP., Lee, S., and Mahowald, A. P. (1995) Proc, Natl. Acad. Sci. 

U. S. A. 92, 1485-1489 

29. Hong, C. C, and Hashimoto, C. (1995) Cell 82. 785-794 

30. Matsumoto, A,, Naito, M., Itakura. H., Ikemoto, S., Asaoka, H., Hayakawa, 1., 

Kanamori, H.. Aburatani, H., Takaku, F., Suzuki, H., Kobari, Y., Miyai, T., 
Takahashi, IC, Cohen, E. H., Wydro, R., Housman, D. E., and Kodama, T. 
(1990) Proc. Natl. Acad. Sci. U. S. A. 87, 9133-9137 

31. Thorpe, D. S., and Garbers, D. L. (1989) J. Biol. Chem. 264, 6545-6549 

32. Dangott. L. J., Jordan, J. E., Bellet, R. A., and Garbers, D. L. (1989) Proc. Natl. 

Acad. Sci. U. S. A. 86, 2128-2132 

33. Chung. D. W.. Fujikawa, K., McMulIen, B. A., and Davie, E. W. (1986) Bio- 

chemistry 25, 2410-2417 

34. Fujikawa. K., Chung, D. W.. Hendrickson, L. E., and Davie, E. W. (1986) 

Biochemistry 25, 2417-2424 

35. I^ytus, S. P., Loeb, K. R., Hagen, F. S., Kurachi. K., and Davie, E. W. (1988) 

Biochemistry 27, 1067-1074 



36. Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132 

37. Bleyl, S., Nelson, L„ Odelberg, S. J., Ruttenberg, H. D., Otterud, B., Leppert, 

M., and Ward, K. (1995) Am. J. Hum. Genet. 56, 408-415 

38. Gubb, D., and Garcia -BelUdo, A. (1982) J. Embryol. Exp, MorphoL 68, 37-57 

39. Zheng, L.. Zhang, J., and Carthew. R. W. (1995) Development 121, 3045-3055 

40. Cadigan, K. M., and Nusse, R. (1997) Genes Dev. 11, 3286-3305 

41. Shulman, J. M., Perrimon, N., and Axclrod, J. D. (1998) Trends Genet. 14, 

452-458 

42. Lin, K., Wang, S., Julius, M. A., Kitajewski, J., Moos, M., Jn, and Luyten, F. P. 

(1997) Proc. Natl. Acad. Sci. U. S. A 94, 11196-11200 

43. Wang, S., Krinks, M., Lin, K., Luyten, F. P., and Moos. M., Jr. (1997) Cell 88, 

757-766 

44. Leyns, L., Bouwmeester, T., Kim, S, H., Piccolo, S., and De Robertis, E. M. 

(1997) Cell 88, 747-756 

45. Rehn, M., and Pihlajaniemi, T. (1995) J. Biol. Chem. 270, 4705-4711 

46. Song, L., and Fricker, L. D. (1997) J. Biol. Chem. 272, 10543-10550 

47. Xu, Y. K., and Nusse, R. (1998) Curr. Biol. 8, R405— R406 

48. Masiakowski, P., and Yancopoulos, G. D. (1998) Curr. Biol. 8, R407 

49. Saldanha, J., Singh, J., and Mahadevan, D. (1998) Protein Sci. 7, 1632-1635 

50. Wu. X., (5olden, K., and Bodmer, R. (1995) Dev. Biol. 169, 619-628 

51. Park, M., Wu, X.. (^Iden, K., Axelrod, J. D., and Bodmer, R. (1996) Dev. Biol. 

177, 104-116 

52. Bodmer, R„ and Venkatesh, T. V. (1998) Dev. Genet. 22, 181-186 

53. Gilbert, S. F. (1994) Z?ci;c/op men to/ Biology (Gilbert, S. F., ed) 4th Ed., Sinauer 

Associates, Inc., Sunderland, MA 

54. Lanske, B,, Karaplis, A. C, Lee, K., Luz, A., Vortkamp, A., Pirro, A., 

Karperien, M., Deflze, L. H. K., Ho, C, Mulligan, R, C, Abou-Samra, A. B., 
Juppner, H., Segre, G. V., and Kronenberg, H. M. (1996) Science 273, 
663-666 

55. Storm, E. E.. Huynh, T. V., Copeland, N. G., Jenkins, N. A., Kingsley, D. M., 

and Lee, S. J. (1994) Nature 368, 639-643 

56. Vortkamp, A., Lee, K., Lanske, B,, Segre, G. V., Kronenberg, H. M., and Tabin, 

C. J. (1996) Science 273, 613-622 

57. Zou, H., Wicser, R., Massague, J., and Niswander, L. (1997) Genes Dev. 11, 

2191-2203 



Exhibit 45 



BIOCHEMISTRY 



Coordinating Author GEOFFREY ZUBAY 

COLUMBIA UNIVERSITY 



ADDISON- WESLEY PUBLISHING COMPANY 

READING, MASSACHUSETTS A MENLO PARK, CALIFORNLV A LONDON A AMSTERDAM' A DON MILLS, ONTARIO A SYDNEY 



m 




Sponsoring Editor 
Production Editor 
Copy Editor 
Text Designer 
Illustrator 

Cover Designer and Illustrator 
Art Coordinator 
Production Manager 
Production Coordinator 



Bob Rogers 
Maicia Mirski 
James K. Madni 
Vanessa^ Pineiro 

Illustration Concepts, Michael Ockler 
Harmus Design Associates, Richard Hannus 
Kristin Belanger 
Karen M. Guardino 
Peter Petraitis 

The text of this hook was composed in Trump by York Graphic Services, 



Illustrations rendered and copyrighted by trying Geis: Figures 1.1, 1.2, 1.3, 1.6, 1.7, 1.8, 1.9, 1.10, 1.11, 
1 13 1 14, 1.15, 1.16, 1.17, 1.18, 1.19, 1.20, 1.21, 1.23, 1.26, 1.27, 1.28, 1.29, 132, 1.33,3.4,3.7,3.16, 
3"l7(aj, 3.43, 3.44, 3.45, 3.46, 3.47, 3.54(a), 3.58(lower half), 3.59, 4.7, 4.8, 4.9, 4.10, 4.15, 4.21, 10.9, 
12 opener, 18.16, 18.35(b). 



Library of Congress Cataloging in Publication Data 

Zubay, Geoffrey L. 
Biochemistry. 



Includes bibliographies and index. 

1. Biological chemistry. I. Title. 
QP514.2.Z83 1983 574.19'2 
ISBN 0-201-09091-0 



82-18502 



Copyright © 1983 by Addison-Wesley Publishing Company, Inc. 

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, 
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, 
or otherwise, without the prior written permission of the publisher. Printed in the United 
States of America. PubUshed simultaneously in Canada. 

ISBN 0-201-09091-0 
ABCDEFGHIJ-Da89876543 




n. ■ 



1 



c 

C 



include the small glycine (a single hydrogen atom) and alanine, serine and 
threonine (with attached hydroxyls), and cysteine (with its sulfhydryl). 
Proline has a hydrocarbon side chain, but its conformational properties put 
it at corners and therefore often outside. 

Results of x-ray crystallography show these classifications by polarity 
and location to be valid iri general for soluble globular proteins. The struc- 
tures of myoglobin and hemoglobin, lysozyme, and cytochrome c all have 
buried hydrophobic side chains with hydrophilic side chains on the sur- 
face. Figure 1-11 shows the positions of all 104 side chains for horse heart 
cytochrome c. This is a protein with a heme group like myoglobin, but 
with an entirely different function. It is one of a chain of rholecules that 
transports electrons in the mitochondria. Hydrophobic side chains (col- 
ored) pack inside the molecule, especially against the left side of the heme 
ring,' and hydrophiUc side chains (grey) are distributed over the surface of 
the molecule. This is a clear example of one way in which sequence dic- 
tates folding. 

Other side chains have pronounced effects on three-dimensional con- 
formation, particularly pToline and the sulfur-containing cysteine . The 
side chain. of proline contains a portion of the main chain and thus tends to 
change the direction of the main chain. Proline is often used to .produce a 
bend in the protein chain, and many of the a helices in myoglobin and 
hemoglobin begin with a proline residue. The side chain — SH of cysteine 
can make a covalent — S — S — lirikage with a similar residue from another 
protein chain (Figure 1-12). After the protein chain has reached its optimal 
low-energy confdrrnation, the disulfide bonds can increase its stability. The 
enzyme ribonuclease contains four such disulfide bridges. If the — S— S — 
linkages are broken and the protein chain is made to unfold in the presence 
bf a, denaturing. agent, such as urea, would it refold when the denaturing 
chemicals were removed? Christian Anfinsen a^nd coworkers answered this 
question in the affirmative in the early 1960s with a classic set of experi- 
ments. 

We have seen that sequence determines folding, but, in fact, it does 
more than that. It determines a unique folding pattern. The importance of 
the folding pattern can be appreciated through a consideration of the pro- 
tein's function. Enzymes, for example, are molecular machines that operate 
with great precision on other molecules called substrates , Chymotrypsin is 
one of a class of pancreatic digestive enzymes that cuts other protein 
chains. The substrate is a polypeptide chain that is held on the surface of 
the enzyme so that a peptide bond can be cleayed. It is necessary that the 
substrate mesh with the enzyme in an exact lock-and-key fashion. In chy- 
motrypsin there is a specificity pocket that fits an aromatic ring side chain 
of the substrate. Immediately adjoining the specificity pocket is an active 
site that assists in cutting a peptide bond near the boundi aromatic ring. 



13 

CHAPTER 1 

PROTEINS: 
AN OVERVIEW 



^Figure 1-10 

The 20 amino acid side chains classified by their probable position in the pro- 
tein molecule. Three-letter and one-letter codes are given for each. The forms 
shown here are the most prevalent at pH 7, Note that histldine can play a dual 
role— neutral (as shown here) or positively charged. 




) 

Exhibit 46 



United States Patent and Trademark: Office 



UNITED STATES DCPARTMCNT OF COMMERCE 
United Stntn Potent and Trademorlc Oflice 
Addms: COMMISSIONER FOR PATENTS 
P.O. Box 14)0 

Alcxandna. Vifsiotn 22J13-I4S0 
www.uspto.goiv 



APPLICATION NO. 



FILING DATE 



RRST NAMED INVENTOR 



ATTORNEY DOCKET NO. 



CONFIRMATION NO. 



09/776,191 



02/02/2001 



Edwin L. Madison 



24745-1607 



3237 



20985 7590 04/21/2006 

FISH & RICHARDSON, PC 
P.O. BOX 1022 

MINNEAPOLIS. MN 55440-1022 



EXAMINER 



PAK, YONG D 



ART IJNIT 



PAPER NUMBER 



I6S2 



DATE MAILED: 04/21/2006 



Please find below and/or attached an Office communication concerning this application or proceeding. 



PTO-90C (Rev. 10/03) 





Application No. 


Applicant(s) 


Office Action Summary 


09/776.191 


MADISON ET AL 


Examiner 


Art Unit 






Yong D. Pak 


1652 





— The MAILING DATE of this communication appears on the cover sheet with the correspondence address — 
Period for Reply 



A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) OR THIRTY (30) DAYS, 
WHICHEVER IS LONGER, FROM THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of lime may be available under the provisions of 37 CFR 1 .136(a). In no event, hcwever, nnay a reply be linnety filed 
after SIX (6) MONTHS from the ntailing date of this communication. 

• tf NO period for reply is specified above, the maximum statutory period will apply and will expire SPC (6) MONTHS from the mailing date of this communicalion. 

* Failure to reply wHhtn the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after the mailing date of this conimunication, even if timely nied, may reduce any 
earned peXenX term adjustment. See 37 CFR 1.704(b). 

Status 

1)^ Responsive to communication(s) filed on 30 January 2006 . 
2a)n This action is FINAL. 2b)[3 This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parfe Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) S Claim(s) See Continuation Sheet is/are pending in the application. 

4a) Of the above claim(s) 1-3. 5. 10-13. 19-20. 34-36. 40^6. 48-55. 108-109 113-116. 118-120 and 122-126 
is/are withdrawn from consideration. 

5) 0 Claim(s) is/are allowed. 

6) E1 Claim(s) 1-3. 5. 1 1-13. 19. 20.34-36. 40-42. 1 13 and 1 14 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) n Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) 0 The specification is objected to by the Examiner. 

10) 0 The drawing(s) filed on is/are: a)^ accepted or b)^ objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1.85(a). 
Replacement drawing sheet(s) including the con'ection is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 

1 1) CI] The oath or declaration is objected to by the Examiner. Note the attached Office Action or fonm PTO-1 52. 

Priority under 35 U.S.C. § 119 

12) n Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 1 9(a)-(d) or (0. 
a)n All b)D Some * c)n None of: 

1. D Certified copies of the priority documents have been received. 

2. n Certified copies of the priority documents have been received in Application No. . 

3. D Copies of the certified copies of the priority documents have been received in this National Stage 

application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 



Attachment(s) 

1) □ Notice of References Cited (PTO-e92) 4) □ Inten^iew Summary (PTO-413) 

2) □ Notice of Draftsperson's Patent Drawing Review (PTO-948) Paper No(s)/Mail Date. . 

3) □ Information Disclosure Stalemenl(9) (PTO-1 449 or PTO/SB/08) 5) □ Notice of Informal Patent Application (PTO-1 52) 

Paper No(s)/Mail Date . 6) □ Other . 



U.S. Patent and Trademark Offieo 
PTOL-326 (Rev. 7-05) 



Office Action Summary 



Part of Paper No./Mail Date 20060404 



Continuation Sheet (PTOL-326) Application No. 09/776,191 



Continuation of Disposition of Claims: Claims pending in the application are 1-3.5,10*13,19,20.34-36.40-46,48- 
55.108,109,113-116,118-120 and 122-126. 



2 



Application/Control Number: 09/776,191 
Art Unit: 1652 



Page 2 



DETAILED ACTION 

This application is a CIP of 09/657,986, now issued as U.S. Patent No. 
6,797,504. 

Continued Examination Under 37 CFR 1.114 

A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 
30, 2006. amending claims 1. 5, 12-13, and 113-114 and canceling claims 6-7, 9-10, 14. 
16, 18 and 137, has been entered. 

Claims 1-3. 5, 10-13. 19-20. 34-36, 40-46. 48-55, 108-109 113-116, 118-120 and 
122-126 are pending. Claims 1-3. 5, 10-13, 19-20. 34-36. 40-46, 48-55. 108-109 113- 
116, 11 8-1 20 and 122-126 are withdrawn. Claims 1-3. 5, 11-13, 19-20, 34-36. 40-42 
and 113-114 are under consideration. 

Priority 

Applicant's claim for domestic priority under 35 U.S.C. 1 19(e) is acknowledged. 
However, the provisional applications upon which priority is claimed fails to provide 
adequate support under 35 U.S.C. 112 for claims 11-13 and 34 of this application. 



Application/Control Number: 09/776,191 Page 3 

Art Unit: 1652 

Provisional applications 60/179,982, 60/183.542, 60/213,124, 60/220,970 and 
60/234,840 fail to provide adequate support for polypeptides comprising the serine 
protease domain of MTSP1 . Provisional applications 60/1 79,982 and 60/1 83,542 
describe polypeptides related MTSP3 and provisional application 60/213,124, 
60/220,970 and 60/234,840 describe polypeptides related to MTSP4. 

Therefore, the effective filing date for purpose of prior art is the filing date of 
09/657,986, which is 9/8/2000. 

Response to Arguments 

Applicant's amendment and arguments filed on January 30, 2006, have been 
fully considered and are deemed to be persuasive to overcome the rejections previously 
applied. Rejections and/or objections not reiterated from previous office actions are 
hereby withdrawn. 

Claim Objections 

Claims 1 1-13 and 34 are objected for being drawn to non-elected subject matter. 
In response to the previous Office Action, applicants have traversed the above rejection. 
Applicants argue that claims 11-13 and 34 are directed to elected subject matter. Even 
though claims are drawn to MTSP1 , the elected subject matter, the claims are also 
drawn to non-elected subject matter, i.e. MTSP3 (SEQ ID N0:4), MTSP4 (SEQ Dl 
N0:6), MTSP6 (SEQ Dl NO: 12), corin, enteropeptidase, human ainA^ay trypsin-like 
protease , TMPRSS2. TMPRSS4. Hence the objection is maintained. 



Application/Control Number: 09/776,191 Page 4 

Art Unit: 1652 

Claim Rejections - 35 USC §112 

The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

Claims 1-3. 5. 11-12. 13 and claims 19-20, 34-36, 40-42 and 113-114 depending 
therefrom rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claims 1-3, 5. 11-12, 13 recite the phrase "substantially purified single-chain 
polypeptide". The metes and bounds of the phrase in the context of the above claims 
are not clear to the Examiner. It is not clear to the Examiner what is considered as 
"substantially purified" by the applicants. A perusal of the specification did not provide a 
clear definition for the above phrase. Without a clear definition, those skilled in the art 
would be unable to conclude if a polypeptide is a "substantially purified" polypeptide 
without knowing the metes and bounds of the phrase. Examiner requests clarification of 
the above phrase. 

Claim 1 and claims 2-3, 5. 11-13. 19-20, 34-36, 40-42 and 113-114 depending 
therefrom are rejected under 35 U.S.C. 112, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claim 1 recites the phrase "the MTSP protease domain or catalytically active 
fragment thereof is the only portion of the single-chain polypeptide from the MTSP". 



Application/Control Number: 09/776.191 Page 5 

Art Unit: 1652 

The metes and bounds of the phrase in the context of the claim is not clear. It is not 
clear to the Examiner as to how one skilled in the art would identify a given amino acid 
sequence as being "from MTSP" or not being "from MTSP". Examiner has interpreted 
the claims broadly to mean that a "single-chain polypeptide comprising a MTSP 
protease domain or catalytically active fragment thereof is the only portion of the single- 
chain polypeptide from the MTSP" is a "single-chain polypeptide comprising a fragment 
consisting of a protease domain or a catalytically active fragment thereof. Examiner 
requests clarification of the above phrase. 

Claims 12-13 and claims 113-114 depending therefrom are rejected under 35 
U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and 
distinctly claim the subject matter which applicant regards as the invention. 

Claims 12-13 recite the phrase "protease domain has a sequence of amino acid 
residues set forth as amino acids 615-855 of SEQ ID NO:2" or "protease domain whose 
sequence of amino acid residues is set forth as amino acid residues 615-855 of SEQ ID 
N0:2". The metes and bounds of the phrase in the context of the claims are not clear. 
It is not clear to the Examiner if the recited amino acid sequence has the amino add 
sequence of SEQ ID NO:2 or is a representative member of a genus. Examiner 
suggests amending the phrase as "protease domain comprises amino acids 615-855 of 
SEQ ID N0:2" to clearly indicate that the protease domain has the amino acids 615-855 
of SEQ ID NO:2. 



Application/Control Number: 09/776,191 Page 6 

Art Unit: 1652 

Claim 19-20 are rejected under 35 U.S.C. 112. second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Claims 19-20 recite the phrase "free Cys", The metes and bounds of the phrase 
in the context of the above claims are not clear to the Examiner. It is not clear to the 
Examiner what is considered as "free Cys" by the applicants. A perusal of the 
specification did not provide a clear definition for the above phrase. Without a clear 
definition, those skilled in the art would be unable to conclude if Cys is "free". Examiner 
requests clarification of the above phrase. 

Claim 19 is rejected under 35 U.S.C. 112. second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. 

Claim 19 recites the phrase "exhibits proteolytic activity". The metes and bounds 
of the phrase in the context of the above claim are not clear to the Examiner. It is not 
clear to the Examiner either from the specification or from the claims as to what 
applicants mean by the above phrase. Examiner requests clarification of the above 
phrase. 

The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 



Application/Control Number: 09/776.191 Page 7 

Art Unit: 1652 

Claims 1-3. 5. 9. 11. 19-20. 34-36, 40-42 and 1 13-114 are rejected under 35 
U.S.C. 112. first paragraph, as containing subject matter which was not described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventor(s). at the time the application was filed, had possession of the claimed 
invention. 

Claims 1-3, 5. 9, 19-20. 35-36, 40-42 and 113-114 are drawn to a polypeptide 
comprising a protease or catalytically active portion of type-ll membrane-type serine 
protease (MTSP) from any source. Claims 1 1 and 34 limit the MTSP polypeptide to a 
MTSP1 polypeptide from any source. Therefore, these claims are drawn to a genus of 
polypeptides having any structure. The specification only teaches four species, amino 
acids 615-855 of SEQ ID NO:2, amino acids of 205-437 of SEQ ID NO:4, amino acids 
of SEQ ID NO:6 and amino acids 217-443 of SEQ ID NO:1 1 . These species are not 
enough to describe the whole genus and there is no evidence on the record of the 
relationship between the structure of the above catalytically active protease domains of 
SEQ ID NOs: 2, 4. 6 and 1 1 and the structure of the serine protease domain of any or 
all MTSP polypeptides or MTSP1 polypeptides. Further, the specification does not 
describe the structure of a catalytically active portion of any or all MTSP polypeptide. 
Therefore, the specification fails to describe a representative species of the genus of 
polypeptides comprising of a serine protease domain or a catalytically active portion of a 
MTSP polypeptide. 

Given this lack of description of the representative species encompassed by the 
genus of the claims, the specification fails to sufficiently describe the claimed invention 



Application/Control Number: 09/776,191 Page 8 

Art Unit: 1652 

in such full, clear, concise, and exact terms that a skilled artisan would recognize that 
applicants were in possession of the inventions of claims 1-3. 5. 9, 11, 19-20, 34-36, 40- 
42 and 113-114. 

Applicant is referred to the revised guidelines concerning compliance with the 
written description requirement of U.S.C. 112» first paragraph, published in the Official 
Gazette and also available at www.uspto.qov . 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

■ 

Applicants argue that the claims meet the written description guideline since the 
specification teaches common elements of MTSP and protease domains of MTSPs. 
thereby providing structural and functional characteristics of the various species. 
Applicants also argue that the specification explicitly provides several catalytically active 
portions of MTSP, SEQ ID N0:2. 4. 6 and 1 1 (MTSP1 . MTSP3. MTSP4 and MTSP 6), 
along with how to make other catalytically active fragments of MTSP. and therefore, the 
specification provides "relevant, identifying characteristics" of a representative number 
of species of the claimed genus. Examiner respectfully disagrees. The claims are 
drawn to polypeptides comprising any protease domains or any or all catalytically active 
fragments of said protease domains of any or all MTSP or any or all MTSP1 , including 
any or all recombinants, variants and mutants of said MTSP or MTSP1 . The claims are 
drawn to polypeptides having any structure and therefore, the claims are drawn to a 
genus encompassing species having substantial variation and fails to describe a 
representative number of species. As discussed in the written description guidelines. 



Application/Control Number: 09/776.191 Page 9 

Art Unit: 1652 

the written description requirement for a claimed genus may be satisfied through 
sufficient description of a representative number of species by actual reduction to 
practice, reduction to drawings, or by disclosure of relevant, identifying characteristics, 
i.e., structure or other physical and/or chemical properties, by functional characteristics 
coupled with a known or disclosed correlation between function and structure, or by a 
combination of such identifying characteristics, sufficient to show the applicant was in 
possession of the claimed genus. A representative number of species means that the 
species which are adequately described are representative of the entire genus. Thus, 
when there is substantial variation within the genus, one must describe a 
sufficient variety of species to reflect the variation within the genus. Satisfactory 
disclosure of a representative number depends on whether one of skill in the art would 
recognize that the applicant was in possession of the necessary common attributes or 
features of the elements possessed by the members of the genus in view of the species 
disclosed. For inventions in an unpredictable art. adequate written description of a 
genus which embraces widely variant species cannot be achieved by disclosing only 
one species within the genus. In the instant case the claimed genera of the claims are 
drawn to species which are widely variant in structure. The genus of the claims are 
structurally diverse as it encompasses any catalytically active protease domains of any 
or all MTSP or MTSP1, excepting having serine protease activity. As such, neither the 
description of solely structural features present in all members of the genus is sufficient 
to be representative of the attributes and features of the entire genus. 
Hence the rejection is maintained. 



Application/Control Number: 09/776.191 



Art Unit: 1652 



Page 1 0 



Claims 1-3. 5. 9. 19-20. 34-36. 40-42 and 113-114 are rejected under 35 
U.S.C. 112, first paragraph, because the specification, while being enabling for a 
polypeptide comprising amino acids 615-855 of SEQ ID N0:2. does not reasonably 
provide enablement for a polypeptide comprising any protease domain of any type II 
membrane type serine protease (MTSP) or MTSP1 or a catalytically active portion 
thereof. The specification does not enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the invention 
commensurate in scope with these claims. 

Factors to be considered in determining whether undue experimentation is 
required are summarized in In re Wands 858 F.2d 731, 8 USPQ2nd 1400 (Fed. Cir. 
1988) . They include (1) the quantity of experimentation necessary. (2) the amount of 
direction or guidance presented, (3) the presence or absence of working examples. (4) 
the nature of the invention. (5) the state of the prior art. (6) the relative skill of those in 
the art, (7) the predictability or unpredictability of the art. and (8) the breadth of the 
claims. 

Claims 1-3. 5. 9. 19-20, 35-36, 40-42 and 113-114 are drawn to a polypeptide 
comprising a protease or catalytically active portion of type-ll membrane-type serine 
protease (MTSP) from any source. Claims 1 1 and 34 limit the MTSP polypeptide to a 
MTSP1 polypeptide from any source. Therefore, these claims are drawn to 
polypeptides having undefined structure. 

The scope of the claims is not commensurate with the enablement provided by 
the disclosure with regard to the extremely large number of polypeptides comprising a 



Application/Control Number: 09/776. 1 91 Page 1 1 

Art Unit: 1652 

protease or catalytically active domain broadly encompassed by the claims. Since the 
amino acid sequence of a protein determines its structural and functional properties, 
predictability of which changes can be tolerated in a protein's amino acid sequence and 
obtain the desired activity requires a knowledge of and guidance with regard to which 
amino acids in the protein's sequence, if any, are tolerant of modification and which are 
conserved (i.e. expectedly intolerant to modification), and detailed knowledge of the 
ways in which the proteins' structure relates to its function. However, in this case the 
disclosure is limited to the polypeptide comprising amino acids 615-855 of SEQ ID 
NO:2, or the amino acids of SEQ ID NO:50. 

It would require undue experimentation of the skilled artisan to make and use the 
claimed polypeptides. The specification is limited to teaching the use of polypeptide 
comprising amino acids 615-855 of SEQ ID N0:2 or the amino acids of SEQ ID NO:50 
but provides no guidance with regard to the making of variants and mutants or with 
regard to other uses. In view of the great breadth of the claim, amount of 
experimentation required to make the claimed polypeptides, the lack of guidance, 
working examples, and unpredictability of the art in predicting function from a 
polypeptide primary structure, the claimed invention would require undue 
experimentation. As such, the specification fails to teach one of ordinary skill how to 
use the full scope of the polypeptides encompassed by the claims. 

While enzyme isolation techniques, recombinant and mutagenesis techniques 
are known, and it is routine in the art to screen for multiple substitutions or multiple 
modifications as encompassed by the instant claims, the specific amino acid positions 



Application/Control Number: 09/776.191 Page 12 

Art Unit: 1652 

within a protein's sequence where amino acid modifications can be made with a 
reasonable expectation of success in obtaining the desired activity/utility are limited in 
any protein and the result of such modifications is unpredictable. In addition, one skilled 
in the art would expect any tolerance to modification for a given protein to diminish with 
each further and additional modification, e.g. multiple substitutions. 

The specification does not support the broad scope of the claims which 
encompass all modifications and variants of a protease or catalytically active domain or 
modifications of amino acids 615-855 of SEQ ID N0:2 because the specification does 
not establish: (A) regions of the protein structure which may be modified without 
affecting MTSP/serine protease activity; (B) the general tolerance of MTSP to 
modification and extent of such tolerance; (C) a rational and predictable scheme for 
modifying any amino acid residue with an expectation of obtaining the desired biological 
function; and (D) the specification provides insufficient guidance as to which of the 
essentially infinite possible choices is likely to be successful. 

Thus, applicants have not provided sufficient guidance to enable one of ordinary 
skill in the art to make and use the claimed invention in a manner reasonably correlated 
with the scope of the claims broadly including protease or catalytically active domains of 
MTSP with an enormous number of amino acid modifications of the MTSP polypeptides 
and of amino acids 615-855 of SEQ ID N0:2. The scope of the claims must bear a 
reasonable correlation with the scope of enablement {In re Fisher, 166 USPQ 19 24 
(CCPA 1 970)). Without sufficient guidance, determination of the serine protease 
domain or the catalytically active domain of MTSP having the desired biological 



Application/Control Number: 09/776.191 Page 13 

Art Unit: 1652 

characteristics is unpredictable and the experimentation left to those skilled in the art is 
unnecessarily, and improperly, extensive and undue. See In re Wands 858 F.2d 731. 8 
USPQ2nd 1400 (Fed. Cir. 1988). 

In response to the previous Office Action, applicants have traversed the above 
rejection. 

Applicants argue that the level of skill in this art is high and the specification 
teaches structural and functional features sufficient to enable one of skill in the art to 
make sue the single chain polypeptides comprising catalytically active portion of an 
MTSP protease domain, by providing structure of MTSP polypeptides and their 
protease domains, as well as their conserved structures. Examiner respectfully 
disagrees. The scope of the claims, which are drawn to polypeptides comprising any 
protease domains or any or all catalytically active fragments of said protease domains 
of any or all MTSP or any or all MTSP1 , including any or all recombinants, variants and 
mutants of said MTSP or MTSP1 . is not commensurate with the enablement provided 
by the disclosure with regard to the extremely large number of polypeptides comprising 
a protease or catalytically active domain broadly encompassed by the claims. Even 
though the structure of some MTSP are known, the claims are drawn to any or all 
catalytically active fragments of any or all protease domains of any or all MTSP or 
MTSP1. As discussed above, predictability of which changes can be tolerated in a 
protein's amino acid sequence and obtain the desired activity requires a specific 
knowledge of and guidance with regard to which specific amino acids in the protein's 
sequence, can be modified such that the modified polypeptide continues to have said 



Application/Control Number: 09/776,191 Page 14 

Art Unit: 1652 

claimed activity. It is this specific guidance that applicants do not provide. While the art 
may teach in general the structure of MTSP conserved amino acid sequences, protease 
domains, X-ray crystal structure and etc, such teachings will not reduce the burden of 
undue experimentation on those of ordinary skill in the art. 

Applicants argue that the specification discloses working examples, thus a 
person skilled in the art has sufficient guide in making the claimed polypeptides. 
Examiner respectfully disagrees. Even though the structure of some MTSP are taught, 
the claims are not only drawn to polypeptides comprising catalytically active fragments 
of only MTSP1, MTSP3. MTSP4 and MTSP6, but to any or all mutants, variants and 
recombinants of any MTSP. Without specific guidance, those skilled in the art will be 
subjected to undue experimentation of making and testing each of the enormously large 
number of mutants that results from such experimentation. While the art may teach in 
general the structure of MTSP, conserved amino acid sequences, and etc, such 
teachings will not reduce the burden of undue experimentation on those of ordinary skill 
in the art. 

Hence the rejection is maintained. 

Applicants argue that it would be unfair, unduly limiting and contrary to the public 
policy upon which the patent laws are based to require applicant to limit the instant 
claims to only one exemplified protease domain. This argument is moot since 
patentability is based on statutes under 35 USC 101, 1 12, 102 and/or 103. 



Claim Rejections - 35 USC § 102 



Application/Control Number: 09/776,191 
Art Unit: 1652 



Page 15 



The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the Invention was known or used by others In this country, or patented or described In a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

(e) the invention was described In (1) an application for patent, published under section 122(b). by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the International application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 



Claims 1 -3. 5, 1 1 -1 3. 1 9-20. 34-36. 40-42 and 1 1 3-1 1 4 are rejected under 35 
U.S.C. 102(b) as being anticipated by Takeuchi et al. (see rejection of the phrase 
"MTSP protease domain or catalytically active fragment there is the only portion of the 
single-chain polypeptide from the MTSP" under 35 USC 112, 2"** paragraph above) 

Claims 1-3. 5. 11-13. 19-20 and 34 are drawn to a polypeptide comprising 
fragment consisting of a serine protease domain of MTSP having the characteristics 
recited in the claims. Claims 35-36 are drawn to a conjugate comprising a polypeptide 
comprising a serine protease domain of MTSP and a targeting agent. Claims 40 —42 
and 113-114 are drawn to a solid support comprising a polypeptide comprising a serine 
protease domain of MTSP. 

Takeuchi et al. (Reference IJ : PTO-1449) teaches a polypeptide comprising a 
fragment consisting of a serine protease domain that is 100% identical to amino acids 
615-855 of SEQ ID NO:2 of the instant invention (page 1 1060. 2"** full paragraph). 
Takeuchi et al. discloses a purified activated protease domain, comprising amino acids 



Application/Control Number: 09/776,191 Page 16 

Art Unit: 1652 

615-855 of SEQ ID NO:2. confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). The MTSP of Takeuchi et al. is not expressed on normal 
endothelia cells (page 1 1054, last paragraph and page 1 1055, 2^^^ full paragraph), is of 
human origin (Figure 1 ), consists essentially of the protease domain having catalytic 
activity (page 1 1060, 2"*^ full paragraph), and is expressed in tumor cells (page 1 1055, 
top paragraph). 

Takeuchi et al. teaches a catalytically active polypeptide comprising the serine 
protease domain linked to a His-tag (page 11055, 3^^ full paragraph, page 11057, 4'*^ full 
paragraph). Takeuchi et al. also teaches a solid support comprising said polypeptide 
(page 1 1057, 4th full paragraph and Figure 5). Therefore, the teaching of Takeuchi et 
al, anticipates claims 1-3, 5, 11-13, 19-20, 34-36. 40-42 and 113-114 are. 

Examiner notes that the contents of the reference were made public at the 
National Academy of Sciences colloquium held February 20-21 , 1999 (see top of 
reference). 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that Takeuchi et al. does not anticipate the instant claims 
because it fails to disclose any polypeptides that incorporate all the features of claim 1 , 
a single chain polypeptide having an MTSP portion, wherein the MTSP portion is a 
protease domain or a smaller fragment and wherein the MTSP portion has serine 
protease activity. 



Application/Control Number: 09/776.191 Page 17 

Art Unit: 1652 

Applicants argue that the MT-SP1 of Takeuchi et al. is a full-length protein that 
includes additional MTSP regions other than a protease domain, and therefore, said 
MTSP1 of Takeuchi et al. is not a polypeptide where the only MTSP portion of the 
polypeptide is a protease domain or a smaller catalytically active portion of the protease 
domain. Examiner respectfully disagrees. First, the claim recites "a polypeptide 
comprising a MTSP portion" and the claim does not recite the limitation that the 
polypeptide only consist of MTSP portion. Therefore, a full-length MT-SP1 of Takeuchi 
et al. anticipates the instant claims. Second, in addition to the full-length MT-SP1 . 
Takeuchi et al. also discloses a purified activated protease domain, comprising amino 
acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057). Even applicants state that Takedeuchi et al. discloses "that its 
protease domain has an amino acid sequence containing amino acids 615-855 
(Remarks page 36) and that "Takeuchi et al. discloses that its polypeptide includes the 
pro-domain and that the pro-domain is cleaved during auto-activation, resulting in a 
protease domain" (page 37). Therefore, said purified, activated protease domain 
anticipates the instant claims. 

Applicants also argue that the reference of Takeuchi et al. does not anticipate the 
instant claims because the "purified protease domain" of Takeuchi et al. includes the 
His-tag sequence and that the polypeptide construct disclosed by Takeuchi et al. 
includes a sequence of 19 amino acids of a portion of the pro-domain and that his pro- 
domain is disulfide bonded to the protease domain. Examiner respectfully disagrees. 



Application/Control Number: 09/776. 1 91 Page 1 8 

Art Unit: 1652 

Takeuchi et al. also discloses a purified activated protease domain, comprising amino 
acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence of the purified, 
activated protease domain yielding the expected WGGT sequence (Figure 3 and right 
column on page 1 1057 and Figure 6). Further, applicants state that "Takeuchi et al. 
discloses that its polypeptide includes the pro-domain and that the pro-domain is 
cleaved durino auto-activation , resulting in a protease domain" (page 37). 

Applicants also argue that the activated protein derived from the expressed His- 
tag amino acids 596-855 of MT-SP1 of Takeuchi et al. is not a single chain polypeptide 
because the protease domain is disulfide bonded to a pro-doiamin resulting in a two 
chain form. Examiner respectfully disagrees. Takeuchi et al. discloses that the pro- 
domain is disulfide bonded to a protease domain of the full length protein. Contrary to 
applicants argument, Takeuchi et al. does not teach that the pro-domain is disulfide 
bonded to an activated protease domain. Further, a single chain polypeptide is one 
sequence of amino acids beginning with a carboxyl end and terminating with an amino 
end, wherein the amino acids are connected via peptide bonds. Therefore, even the full 
length MT-SP1 of Takeuchi et al. having disulfide bonds can be construed as a single 
chain polypeptide. 

In conclusion, Takeuchi et al. discloses a purified activated protease domain, 
comprising amino acids 615-855 of SEQ ID NO:2, confirmed by an N-terminal sequence 
of the purified, activated protease domain yielding the expected WGGT sequence 
(Figure 3 and right column on page 1 1057 and Figure 6). Further, applicants state that 



Application/Control Number: 09/776,191 Page 19 

Art Unit: 1652 

"Takeuchi et al. discloses that its polypeptide includes the pro-domain and that the pro- 
domain is cleaved during auto-activation , resulting in a protease domain" (page 37). 
Hence the rejections are maintained. 



Claim Rejections - 35 USC § 102/103 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described In (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 

The following is a quotation of 35 U.S.C. 103(a), which forms the basis for all 
obviousness rejections, set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at the time the invention was made to 
a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 



Claims 1-3. 5, 10-13 and 34 rejected under 35 U.S.C. 102(e) as anticipated by 
or, in the alternative, under 35 U.S.C. 103(a) as obvious over O'Brien et al. 

Claims 1-3. 5, 10-13 and 34 are drawn to a polypeptide comprising a serine 
protease domain of MTSP. 

O'Brien et al. (U.S. Patent No. 5.972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID NO:2 of the instant 
invention (SEQ ID NO:2. columns 19-24). The properties recited in claims 2-3 are 



Application/Control Number: 09/776, 1 91 Page 20 

Art Unit: 1652 

inherent properties of MTSP1 taught by O'Brien et aL since the polypeptide of O'Brien 
et al. and the instant invention have identical structure and therefore identical 
properties. 

O'Brien et al. teaches a serine protease domain having proteolytic activity that is 
100% Identical to amino acids 615-855 of SEQ ID N0:2 (Figure 2. Figure 10 and SEQ 
ID NO: 14). Although the protease domain of O'Brien et al. identified by SEQ ID NO: 14 
has not been purified, the protease domain in the reference and the polypeptide claimed 
by the applicants are one and the same. Therefore, the protease domain anticipates 
the instant invention. 

Since the Office does not have facilities for examining and comparing applicant's 
polypeptide with the polypeptide of the prior art, the burden is on the applicant to show a 
novel or unobvious difference between the claimed product and the product of the prior 
art (i.e., that the polypeptide of the prior art does not possess the same material 
structure and functional characteristics of the claimed polypeptide). See In re Best, 562 
F.2d 1252, 195 USPQ 430 (CCPA 1977) and In re Figzgeraldet aL, 205 USPQ 594. 

Alternatively, O'Brien et al. teaches a method of expressing polypeptides via a 
vector in host cells. O'Brien et al. also teaches that the protease domain could be 
released the used as a diagnostic which has the potential for a target for therapeutic 
intervention (Column 15. lines 35-38). Therefore, it would have t>een obvious to one 
having ordinary skill in the art at the time the invention was made to express the 
protease domain of SQ ID NO: 14 and purify the polypeptide. The motivation of making 
such a polypeptides is to use it as a diagnostic which has the potential for a target for 



Application/Control Number: 09/776.191 Page 21 

Art Unit: 1652 

therapeutic intervention. One of ordinary skill in the art would have had a reasonable 
expectation of success since expression of a heterologous polypeptide is routine in the 
art and O'Brien et al. teaches how to express heterologous polypeptides. 

In response to the previous Office Action, applicants have traversed the above 
rejections. 

Applicants argue that O'Brien et al. does not anticipate any of the instant claims 
because the claims are not directed to a full-length MTSP polypeptide. Examiner 
respectfully disagrees. The claim recites "a polypeptide comprising a MTSP portion" 
and the claim does not recite the limitation that the polypeptide only consist of MTSP 
portion. Therefore, the full-length MT-SP1 of O'Brien et al. anticipates the instant claims. 

Applicants also argue that one of skill in the art would recognize the disclosure of 
the polypeptide of O'Brien as not disclosing a single chain polypeptide. Examiner 
respectfully disagrees. A single chain polypeptide is one sequence of amino acids 
beginning with a carboxyl end and terminating with an amino end, wherein the amino 
acids are connected via peptide bonds. Therefore, the full length MT-SP1 of O'Brien et 
al. can be construed as a single chain polypeptide. 

Applicants argue that one of skill in the art would understand MTSP serine 
proteases to be active only as two chain polypeptides by citing Lu et al. (1999) J. Biol, 
Chem, 272:31293-300 and would not view O'Brien et al. as disclosing a single chain 
polypeptide. Examiner respectfully disagrees. The bibliographi information Lu et al. 
(1999) J. BioL Chem. 272:31293-300 could not be located through J. Bioi Chem, 



Application/Control Number: 09/776. 1 91 Page 22 

Art Unit: 1652 

Applicants are urged to supply the reference or the correct bibolographic information. 
Nevertheless, applicants state that "as expressed, the MTSP polypeptide is an inactive 
single-chain zymogen" (Remarks page 42). Therefore, according to applicants, the full 
length MT-SP1 of O'Brien et al. is a single chain polypeptide and therefore, anticipates 
the claimed invention. 

Hence the rejection is maintained. 

Applicants also argue that O'Brien et al. provides no teaching or suggestion of 
smaller fragments having serine protease activity because it does not teach how to 
make a single chain polypeptide that has serine protease activity. Examiner respectfully 
disagrees. O'Brien et al. teaches a method of expressing polypeptides via a vector in 
host cells. It is well within the skill available in the art to purify the protease domain 
since O'Brien et al. identifies the protease domain. Therefore, it would have been 
obvious to one having ordinary skill in the art at the time the invention was made to 
express the protease domain of SQ ID NO: 14 and purify the polypeptide. The 
motivation of making such a polypeptides is to use it as a diagnostic which has the 
potential for a target for therapeutic intervention. One of ordinary skill in the art would 
have had a reasonable expectation of success since expression of a heterologous 
polypeptide is routine in the art and O'Brien et al. teaches how to express heterologous 
polypeptides. 

Applicants again argue that at the time of filing the instant application, one of skill 
in the art would not have had a reasonable expectation of success to express the 



Application/Control Number: 09/776. 1 91 Page 23 

Art Unit: 1652 

protease domain because art evidences that a single-chained polypeptide would not 
have been expected to have protease activity. Examiner respectfully disagrees. The 
claims are drawn to a polypeptide comprising a fragment consisting of a protease 
domain of SEQ ID N0:2. Therefore, said polypeptide being a single-chained 
polypeptide is an inherence property of said polypeptide since two polypeptides having 
identical structure will have identical function and physical and chemical properties. 
Hence the rejections are maintained. 

Claims 35-36. 40-42 and 113-114 are rejected under 35 U.S.C, 103(a) as being 
unpatentable over O'Brien et al. 

Claims 35-36 are drawn to a conjugate comprising a polypeptide comprising a 

» 

serine protease domain of MTSP and a targeting agent. Claims 40-42 and 113-114 are 
drawn to a solid support comprising a polypeptide comprising a serine protease domain 
of MTSP. 

O'Brien et al. (U.S. Patent No. 6,972,616 - reference P- PTO 1449) teaches a 
polypeptide having 100% identity to the full length MTSP1 of SEQ ID N0:2 of the instant 
invention, as discussed above. O'Brien et al. also teaches that the protease domain 
could be released the used as a diagnostic which has the potential for a target for 
therapeutic intervention (Column 1 5. lines 35-38). 

O'Brien et al. also teaches method of making fragments of SEQ ID N0:2 
(Column 9. lines 22-55). O'Brien et al. teaches said fragments linked to another 
polypeptide (Column 9. lines 54-55) and conjugated to bridging molecules (Column 6, 



Application/Control Number: 09/776.191 Page 24 

Art Unit: 1652 

lines 27-39) for detecting the polypeptide. Assays using polypeptides linked to the 
molecules taught by O'Brien et al. utilize solid supports. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to make a polypeptide comprising of the 
serine protease domain of SEQ ID NO:2 taught by O'Brien et al. and to make 
conjugates and solid support comprising of a polypeptide comprised of the serine 
protease domain of SEQ ID NO:2. The motivation of making such a polypeptides is to 
use it as a diagnostic which has the potential for a target for therapeutic intervention. 
The motivation of making conjugates and solid supports comprising of said polypeptide 
is to use the conjugate and solid support in a variety of diagnostic assays. One of 
ordinary skill in the art would have had a reasonable expectation of success making 
fragments of a polypeptide is routine in the art and O'Brien et al. teaches how to make 
fragments of SEQ ID NO:2. One of ordinary skill in the art would have had a 
reasonable expectation of success in diagnostic assays using conjugates and solid 
supports comprising a polypeptide is very well known, as taught by O'Brien et al. 

Therefore, the above references render claims 35-36 and 40-42 pnma facie 
obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the teachings of O'Brien et al. does not result in the 
instantly claimed compositions because O'Brien et al. does not teach or suggest a 
single chain polypeptide that includes a MTSP protease domain where the polypeptide 



Application/Control Number: 09/776. 1 91 Page 25 

Art Unit: 1652 

does not include any additional MTSP portions and the polypeptide has serine protease 
activity. O'Brien et al. does teach or suggest a single chain polypeptide comprising a 
MTSP portion, wherein the MTSP portion is a protease domain and wherein the MTSP 
portion has serine protease activity and wherein the MTSP portion is the only portion of 
the polypeptide because O'Brien et al. identifies the serine protease domain and one 
having ordinary skill in the art at the time the invention was filed would have been 
motivated to purify the serine protease domain of O'Brien et al. as discussed iabove. 
Hence the rejection is maintained. 

Claims 19-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
O'Brien et al. and Estell et al. in view of Takeuchi et al. 

Claims 19-20 are drawn to a polypeptide comprising the serine protease domain 
of a MTSP wherein free Cys residues are substituted with Ser residues. 

O'Brien et al. teaches a serine protease domain of a MTSP polypeptide, as 
discussed above. 

The reference of O'Brien et al. does not teach a serine protease domain of a 
MTPSP polypeptides wherein free Cys residues have been replaced with Ser residues. 

It is well known in the art that proteins form disulfide bonds via the SH groups of 
Cys residues. Upon making a polypeptide comprising a serine protease domain, a Cys 
residue which normally forms disulfide bonds in the full length polypeptide may be left 
free. For example. Takeuchi et al. (Reference IJ : PTO-1449) teaches that Cysteine at 



Application/Control Number: 09/776. 191 Page 26 

Art Unit: 1652 

position 731 of SEQ ID NO:2 normally forms a disulfide bond with a Cys residue in the 
pro-protease domain (see page 1 1060, top left paragraph and Figures 1 and 2). 

Cys residues are sensitive to oxidation due to their SH side group. Estell et al. 
(U.S. Patent No. 5,346»823) teaches that Cys residues replaced with Ser residues to 
decrease a polypeptide's susceptibility to oxidation (Abstract and Column 10, lines 34- 
38). Ser residues have similar side chains as Cys residues and substitution of a Cys 
residue with a Ser residue is a conservative substitution. 

Therefore, it would have been obvious to one having ordinary skill in the art at 
the time the claimed invention was made to replace free Cys residues in the protease 
domain taught by O'Brien et al. with a Ser residue. One of ordinary skill in the art would 
be motivated to make such a change in order to enhance stability of the polypeptide. 
One of ordinary skill in the art would have had a reasonable expectation of success 
since Estell et al. teaches successful decrease of a protein's susceptibility to oxidation 
by substituting residues sensitive to oxidation with conservative substitutions. 

Therefore, the above references render claims 1 and 16, 18-20, 34 and 137 . 
pnma facie obvious to one of ordinary skill in the art. 

In response to the previous Office Action, applicants have traversed the above 
rejections. Applicants argue that the combination of the teachings of O'Brien et al. with 
the teachings of Estell et al.. and Takeuchi et al. does not result in the instantly claimed 
methods because O'Brien et al. does not teach or suggest a single chain polypeptide 
that includes a MTSP protease domain where the polypeptide does not include any 



Application/Control Number: 09/776,191 Page 27 

Art Unit: 1652 

* 

additional MTSP portions and the polypeptide has serine protease activity and that 
neither Takeuchi et al. nor Estell et al. remedy the defects of O'Brien et al. First, the 
claims are product claims and not method claims. Second, O'Brien et al. does teach or 
suggest a single chain polypeptide comprising a MTSP portion, wherein the MTSP 
portion is a protease domain and wherein the MTSP portion has serine protease activity 
and wherein the MTSP portion is the only portion of the polypeptide because O'Brien et 
al. identifies the serine protease domain and one having ordinary skill in the art at the 
time the invention was filed would have been motivated to purify the serine protease 
domain of O'Brien et al. as discussed above. 

Applicants argue that Takeuchi et al. teaches that every cysteine residue of the 
protein is disulfide bonded and therefore Takeuchi eta I. does not teach or suggest an 
MTSP protease domain having a free Cys residue. Examiner respectfully disagrees. 
Figure 4 applicants are referring to illustrate disulfide bonds of cysteine residues of the 
full length MTSP, for example, the Cys at position 830 is disulfide bonded to Cys at 
position 191. 

Hence the rejections are maintained. 



None of the claims are in condition for allowance. 



Application/Control Number: 09/776.191 
Art Unit: 1652 



Page 28 



Any inquiry cx)ncerning this communication or earlier communications from the 
examiner should be directed to Yong Pak whose telephone number is 671-272-0935. 
The examiner can normally be reached 6:30 A.M. to 5:00 P.M. Monday through 
Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Ponnathapu Achutamurthy can be reached on 571-272-0928. The fax 
phone numbers for the organization where this application or proceeding is assigned 
are 571-273-8300 for regular communications and 703-872-9307 for After Final 
communications. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 571-272- 



Yong D. Pak 

Patent Examiner 1 652 



1600. 




This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the appHcant. 

Defects in the images include but are not limited to the items checked: 

D BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
D FADED TEXT OR DRAWING 

D BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

D LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



